Automatic Library Migration Using Large Language Models: First Results (2408.16151v3)

Published 28 Aug 2024 in cs.SE

Abstract: Despite being introduced only a few years ago, LLMs are already widely used by developers for code generation. However, their application in automating other Software Engineering activities remains largely unexplored. Thus, in this paper, we report the first results of a study in which we are exploring the use of ChatGPT to support API migration tasks, an important problem that demands manual effort and attention from developers. Specifically, in the paper, we share our initial results involving the use of ChatGPT to migrate a client application to use a newer version of SQLAlchemy, an ORM (Object Relational Mapping) library widely used in Python. We evaluate the use of three types of prompts (Zero-Shot, One-Shot, and Chain Of Thoughts) and show that the best results are achieved by the One-Shot prompt, followed by the Chain Of Thoughts. Particularly, with the One-Shot prompt we were able to successfully migrate all columns of our target application and upgrade its code to use new functionalities enabled by SQLAlchemy's latest version, such as Python's asyncio and typing modules, while preserving the original code behavior.

Summary

The paper shows that one-shot prompts enable GPT-4 to achieve a fully functional API migration with all tests passing.
It systematically compares zero-shot, one-shot, and chain-of-thought approaches, highlighting key metrics like test outcomes and type-checking errors.
It identifies challenges such as manual fixture adjustments and outlines future directions for prompt refinement and broader evaluations.

Automatic Library Migration Using LLMs: An Essay

The paper "Automatic Library Migration Using LLMs: First Results" presents a comprehensive paper on utilizing LLMs, particularly GPT-4, to support API migration tasks in software engineering. Conducted by Almeida, Xavier, and Valente, this research addresses the automation challenge in API migration, an area demanding significant manual effort and precision from developers.

Summary of the Study

The primary focus of this paper is on migrating a client application to use a newer version of SQLAlchemy, a popular Object-Relational Mapping (ORM) library within the Python ecosystem. The research specifically evaluates the efficacy of three prompting methods: Zero-Shot, One-Shot, and Chain Of Thoughts, with a goal to establish the most effective approach for leveraging ChatGPT in automating API migration.

Methodology

Target API and Application

The authors selected SQLAlchemy due to its significance and widespread usage in Python applications. The upgrade targeted a transition from SQLAlchemy version 1 to version 2, which introduced major enhancements, including compatibility with Python's typing module and optimized support for asynchronous operations through asyncio. The client application chosen for this paper was a FastAPI-based TODO list implementation connected to a PostgreSQL database, providing a suitable and realistic use case for evaluating the migration efficiency.

Prompts and Migration Process

Three types of prompts were assessed:

Zero-Shot Prompt: Provided no examples, relying solely on the task description.
One-Shot Prompt: Included an example of the required migration, offering a concrete reference for the model.
Chain Of Thoughts Prompt: Featured a step-by-step guide and an example to breakdown the migration process comprehensively.

Each prompt aimed to guide GPT-4 in migrating the application code and subsequently, the application's tests to ensure complete functionality after the migration. Critical evaluation metrics included the number of passing tests, Pylint scores, Pyright type-checking results, and detailed inspections of migrated columns and methods.

Results and Analysis

Application Code Migration

Zero-Shot Prompt: This approach yielded the least effective results. The migrated code did not execute and lacked proper type handling, with critical errors in the import statements and an inability to correctly utilize Python's typing features. The migrated code presented numerous typing and import errors that prevented the application from running.
One-Shot Prompt: Demonstrated significantly better performance, producing a running application where all tests passed successfully. It managed to migrate all required columns and methods correctly, showcasing the importance of providing an example to the LLM for better results. However, an increased number of Pyright typing errors indicated some issues in the typing annotations.
Chain Of Thoughts Prompt: Performed second best, suffering from a minor import error that prevented execution but achieved the lowest Pyright type error count. This prompt showcased the potential of step-by-step guidance in improving the accuracy of more complex tasks.

Tests Migration

Despite correct syntactical migration of the tests, an issue was identified where the application's fixture setup—which initializes the database state between tests—was incorrectly migrated. This was due to a change in default behavior in SQLAlchemy 2, where autocommit is no longer true. The resolution of this issue required manual intervention, highlighting a subtle yet common migration challenge.

Practical and Theoretical Implications

Practically, this research underscores the burgeoning potential of LLMs in software maintenance tasks such as API migration. The findings suggest that while LLMs, specifically GPT-4, can perform remarkably well when provided with suitable examples or detailed guidance, there remain inherent challenges in handling nuanced changes in library behaviors. Theoretically, the paper stresses the importance of prompt engineering and sheds light on the capabilities and limitations of current LLM implementations within the domain of software engineering.

Future Directions

The paper outlines several future directions:

Broadening the Evaluation: Extending the evaluation to other programming languages and libraries can provide a more comprehensive understanding of the LLMs' generalizability.
Improving Prompts: Exploring more intricate prompting strategies, like Few-Shot or Chain Of Symbols, and refining current prompts to improve task execution.
Empirical Usage: Validating the framework in real-world scenarios through developer studies and actual GitHub project integrations.
Diverse LLMs: Evaluating newer LLMs from other providers, such as Google's Gemini and Amazon's Q, to compare performance across different architectures.

In conclusion, this paper provides a substantive first step towards utilizing LLMs for automating API migrations, fostering a foundational understanding that can spur further advancements in the effective application of artificial intelligence in software development.

PDF Markdown

Related Papers

Tweets

https://twitter.com/asankhaya/status/1830780555694366887

https://twitter.com/ComputerPapers/status/1830574458307805585