Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AntiCopyPaster: Extracting Code Duplicates As Soon As They Are Introduced in the IDE (2112.15230v2)

Published 30 Dec 2021 in cs.SE

Abstract: We developed a plugin for IntelliJ IDEA called AntiCopyPaster, which tracks the pasting of code fragments inside the IDE and suggests the appropriate Extract Method refactoring to combat the propagation of duplicates. Unlike the existing approaches, our tool is integrated with the developer's workflow, and pro-actively recommends refactorings. Since not all code fragments need to be extracted, we develop a classification model to make this decision. When a developer copies and pastes a code fragment, the plugin searches for duplicates in the currently opened file, waits for a short period of time to allow the developer to edit the code, and finally inferences the refactoring decision based on a number of features. Our experimental study on a large dataset of 18,942 code fragments mined from 13 Apache projects shows that AntiCopyPaster correctly recommends Extract Method refactorings with an F-score of 0.82. Furthermore, our survey of 59 developers reflects their satisfaction with the developed plugin's operation. The plugin and its source code are publicly available on GitHub at https://github.com/JetBrains-Research/anti-copy-paster. The demonstration video can be found on YouTube: https://youtu.be/_wwHg-qFjJY.

AntiCopyPaster: Inline Extraction of Code Duplicates

The paper "AntiCopyPaster: Extracting Code Duplicates As Soon As They Are Introduced in the IDE" presents a tool developed as a plugin for IntelliJ IDEA, designed to address the pervasive issue of code duplication through the implementation of just-in-time Extract Method refactorings. The tool, named AntiCopyPaster, integrates within the developer's workflow to proactively suggest refactorings at the point of code pasting, thereby improving software quality by reducing redundancy as it arises.

Overview

AntiCopyPaster distinguishes itself from existing methods by its immediate engagement with the development process. It operates by monitoring code pasted within an Integrated Development Environment (IDE) and suggests refactoring recommendations only when deemed necessary by a pre-trained classification model. The tool's effectiveness stems from the capability to recognize duplications in real-time and propose meaningful refactorings without significant interruption to the developer.

Classification Model and Evaluation

A critical part of AntiCopyPaster is its machine learning-based duplicate recognition and refactoring decision mechanism, realized through a Convolutional Neural Network (CNN). This model was trained using a comprehensive dataset comprising 18,942 code fragments extracted from 13 Apache projects, achieving an F-score of 0.82. This performance metric implies a high degree of precision and recall in detecting suitable refactoring opportunities, thereby validating the model's practical applicability in real-world projects.

Experimental Study and Results

The empirical analysis demonstrates AntiCopyPaster's efficacy, capturing substantive user satisfaction and reliability in its recommendations. A survey conducted among 59 developers yielded favorable responses, indicating acceptance and practical usefulness of the tool in enhancing productivity and maintaining code hygiene. Such evaluations align with the statistical validation of the classification approach, underscoring the advantages of embedding machine learning models within development tools to facilitate dynamic software improvement.

Implications and Future Directions

The practical implications of AntiCopyPaster are significant, offering a streamlined solution for managing code duplication without cumbersome searches or extensive developer oversight. The approach circumvents the traditional assumption that developers possess complete insight into an entire codebase, which often is unrealistic in larger software development environments. The tool's implementation could induce broader uptake of the Extract Method refactoring, thus enhancing software maintainability and collaboration efficiency.

From a theoretical perspective, the paper could pave the way for further exploration into applying machine learning-assisted refactoring across different refactoring types and diverse programming languages. Future developments may include refining the tool's decision-making algorithms to reduce overhead and optimize performance further. Potential enhancements may also address integration with other IDEs, promoting the interoperability and adoption of such intelligent refactoring tools in a broader range of development contexts.

In conclusion, AntiCopyPaster embodies a practical advancement in automated code refactoring tools and suggests a potential avenue for AI-driven refactoring decisions in software engineering. It establishes an important foundation for future research aimed at embedding intelligent, real-time decision support systems in software development practices, ultimately supporting the ongoing evolution toward more sustainable software engineering methodologies.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Eman Abdullah AlOmar (32 papers)
  2. Anton Ivanov (8 papers)
  3. Zarina Kurbatova (9 papers)
  4. Yaroslav Golubev (40 papers)
  5. Mohamed Wiem Mkaouer (42 papers)
  6. Ali Ouni (36 papers)
  7. Timofey Bryksin (67 papers)
  8. Le Nguyen (11 papers)
  9. Amit Kini (2 papers)
  10. Aditya Thakur (3 papers)
Citations (7)