Copilot-in-the-Loop: Fixing Code Smells in Copilot-Generated Python Code using Copilot (2401.14176v2)

Published 25 Jan 2024 in cs.SE and cs.AI

Abstract: As one of the most popular dynamic languages, Python experiences a decrease in readability and maintainability when code smells are present. Recent advancements in LLMs have sparked growing interest in AI-enabled tools for both code generation and refactoring. GitHub Copilot is one such tool that has gained widespread usage. Copilot Chat, released in September 2023, functions as an interactive tool aimed at facilitating natural language-powered coding. However, limited attention has been given to understanding code smells in Copilot-generated Python code and Copilot Chat's ability to fix the code smells. To this end, we built a dataset comprising 102 code smells in Copilot-generated Python code. Our aim is to first explore the occurrence of code smells in Copilot-generated Python code and then evaluate the effectiveness of Copilot Chat in fixing these code smells employing different prompts. The results show that 8 out of 10 types of code smells can be detected in Copilot-generated Python code, among which Multiply-Nested Container is the most common one. For these code smells, Copilot Chat achieves a highest fixing rate of 87.1%, showing promise in fixing Python code smells generated by Copilot itself. In addition, the effectiveness of Copilot Chat in fixing these smells can be improved by providing more detailed prompts.

Citations (1)

View on Semantic Scholar

Summary

The paper demonstrates that about 15% of Copilot-generated Python files contain code smells, with 'Multiply-Nested Container' being the most prevalent.
The study employs a keyword-based extraction method and Pysmell for detecting and classifying code smells within dynamically typed Python code.
It reveals that detailed prompt structures in Copilot Chat significantly improve code quality by effectively addressing code smells, although they may sometimes introduce new issues.

Introduction

AI-assisted development tools like GitHub Copilot have been evolving, becoming an integral part of many coding workflows. While these tools bring the power of LLMs to developers, prompting them with contextual code snippets, the question of code quality—specifically, the prevalence of code smells in the output—has been a notable concern. Beiqi Zhang, Peng Liang, Qiong Feng, Yujia Fu, and Zengyang Li have undertaken a structured evaluation of code smells within code generated by GitHub Copilot for Python, analyzing the efficacy of Copilot Chat in addressing these smells.

Methodology

The researchers assembled a dataset with 102 instances of code smells sourced from Python code that Copilot generated. Emphasizing Python's dynamic nature, they identified smells that could impact readability and maintainability. The evaluation focused on two key research questions: the extent of code smell occurrence in the generated code and Copilot Chat's effectiveness in remediation.

Utilizing a keyword-based mining approach, the team extracted relevant Python files from GitHub. Subsequently, Pysmell, a detection tool tailored for Python smells, scanned these files. The paper distinguished between various types of detected smells and meticulously ensured that all listed code smells indeed originated from Copilot-generated code.

Results

The researchers noted that approximately 15% of the evaluated files contained code smells, with "Multiply-Nested Container" being most prevalent. They found that Copilot-generated Python code wasn't immune to such suboptimal code patterns, which could potentially increase error proneness and hinder code maintainability.

The paper also shed light on the newer feature, Copilot Chat, a beta service positioned to enhance code quality through interaction in natural language. By leveraging different prompt structures, the research evaluated Copilot Chat’s capability to address and rectify the detected smells. Interestingly, a more detailed prompt structure was found to be significantly more effective.

Discussion and Implications

The paper indicates that while Copilot Chat shows promise in addressing Python code smells, it can also potentially introduce additional smells during the fixing process. Hence, developers should remain cautious, employing detailed and specific prompts to guide Copilot Chat effectively. The insights derived could inform the enhancement of automated code generation tools, ensuring enhanced code quality and reduced technical debt in AI-assisted development environments.

Further exploration into the handling of code smells across different languages and the effects of diversified prompt structures could pave the way to more robust AI-powered coding assistants. This research provides a foundation for continued investigation into the enhancement of code generation tools, optimizing them for industry-scale usage without compromising on code quality.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (5)

Tweets

https://twitter.com/ComputerPapers/status/1750792103431389504