AI-Assisted Assessment of Coding Practices in Modern Code Review (2405.13565v1)

Published 22 May 2024 in cs.SE and cs.AI

Abstract: Modern code review is a process in which an incremental code contribution made by a code author is reviewed by one or more peers before it is committed to the version control system. An important element of modern code review is verifying that code contributions adhere to best practices. While some of these best practices can be automatically verified, verifying others is commonly left to human reviewers. This paper reports on the development, deployment, and evaluation of AutoCommenter, a system backed by a LLM that automatically learns and enforces coding best practices. We implemented AutoCommenter for four programming languages (C++, Java, Python, and Go) and evaluated its performance and adoption in a large industrial setting. Our evaluation shows that an end-to-end system for learning and enforcing coding best practices is feasible and has a positive impact on the developer workflow. Additionally, this paper reports on the challenges associated with deploying such a system to tens of thousands of developers and the corresponding lessons learned.

Citations (4)

View on Semantic Scholar

Summary

The paper demonstrates that AutoCommenter uses a T5-based model to automatically detect and comment on coding best practice violations.
It leverages a training set of 800k examples from over 3 billion, achieving more than 80% positive feedback during early deployment.
The study shows that 40% of the model’s suggestions were resolved, enhancing code quality, reducing review time, and educating developers.

AutoCommenter: A Glimpse into Automated Code Reviews

Introduction

Modern code reviews have become a crucial part of the software development process, but they often require considerable time and expertise. With the advent of LLMs, there's a promising potential to automate some of these tasks. The paper presents "AutoCommenter," an LLM-based tool developed by Google to partially automate code reviews, especially to detect best practice violations.

How AutoCommenter Works

The Concept and Implementation

AutoCommenter is designed to help developers adhere to coding best practices by automatically detecting violations and suggesting improvements. It operates with an LLM model based on T5 (Text-to-Text Transfer Transformer) and is specifically trained to analyze code for adherence to best practices.

Here's a high-level overview of the process:

Model Setup: The model uses a text-to-text transformation technique. It generates comments highlighting best practice violations in the code.
Training Data: The training corpus includes diverse tasks like code-review comment resolution and next edit prediction. For best practice violations, roughly 800k examples are used out of over 3 billion total examples.
Inference: The model is invoked through an integrated service that developers interact with, either via their IDE or the code review system. It provides immediate feedback on code practices as changes are made, helping developers learn and adhere to best practices in real-time.

Deployment Phases

AutoCommenter was rolled out in multiple stages:

Team Testing: Initially evaluated by the project team.
Early Adopters: Around 3,000 volunteer developers used the tool.
A/B Experiment: Deployed to half of Google's developers to gain broader insights and gather substantial feedback.
Full Release: After assessing the results and making necessary adjustments, the tool was made available to all Google developers.

Key Results and Observations

Numerical Insights

Developer Feedback: Feedback was crucial in refining AutoCommenter. The positive feedback ratio steadily improved, reaching over 80% by the time of full deployment.
Comment Usage: AutoCommenter learned from real-world code comments, posting approximately 330 distinct best practice URLs. The system covered 68% of best practices frequently referenced by human reviewers.
Resolution Rate: About 40% of AutoCommenter’s suggestions were actively resolved by developers, which is a notable outcome considering these suggestions often deal with nuanced or subjective best practices.

Practical Implications

From a practical perspective, AutoCommenter:

Enhances Code Quality: By offering timely feedback, it helps improve the quality of code, making sure that best practices are followed more rigorously.
Saves Time: Reduces the time expert developers spend on reviewing basic best practice violations, letting them focus more on overall functionality and complex issues.
Educates Developers: It acts as an on-the-job learning tool, especially beneficial for less experienced developers.

Lessons Learned

Importance of High Precision: For developers to trust and adopt the tool, high precision in detecting relevant and useful comments was crucial.
Handling Evolving Best Practices: Over time, some best practices evolve. Thankfully, AutoCommenter included mechanisms to suppress outdated rules without significant downtime.
Balancing Intrinsic and Extrinsic Evaluations: While intrinsic evaluations during model training were helpful, real-world feedback was indispensable for refining and validating the tool's performance.

Future of Automated Code Reviews

The success of AutoCommenter demonstrates that leveraging LLMs can significantly enhance the code review process. Nevertheless, there’s room for improvement, especially in increasing the coverage of best practices. Future advancements, such as models with larger context windows, promise to extend these capabilities further.

Conclusion

AutoCommenter shows the potential benefits of LLMs in automating parts of the code review process. With over 80% developer approval and a substantial portion of comments being correctly resolved, it’s a promising step towards more intelligent and efficient software development practices. As technology continues to evolve, tools like AutoCommenter will become increasingly integral to development workflows.

PDF Markdown

Related Papers

Tweets

https://twitter.com/ComputerPapers/status/1793891059614830802

https://twitter.com/sergiu_bodiu/status/1868603057157845464

HackerNews

AI-Assisted Assessment of Coding Practices in Modern Code Review (3 points, 0 comments)
AI-Assisted Assessment of Coding Practices in Modern Code Review (2 points, 1 comment)