CORRECT: Code Reviewer Recommendation at GitHub for Vendasta Technologies (1807.04130v1)

Published 9 Jul 2018 in cs.SE

Abstract: Peer code review locates common coding standard violations and simple logical errors in the early phases of software development, and thus, reduces overall cost. Unfortunately, at GitHub, identifying an appropriate code reviewer for a pull request is challenging given that reliable information for reviewer identification is often not readily available. In this paper, we propose a code reviewer recommendation tool--CORRECT--that considers not only the relevant cross-project work experience (e.g., external library experience) of a developer but also her experience in certain specialized technologies (e.g., Google App Engine) associated with a pull request for determining her expertise as a potential code reviewer. We design our tool using client-server architecture, and then package the solution as a Google Chrome plug-in. Once the developer initiates a new pull request at GitHub, our tool automatically analyzes the request, mines two relevant histories, and then returns a ranked list of appropriate code reviewers for the request within the browser's context. Demo: https://www.youtube.com/watch?v=rXU1wTD6QQ0

Citations (21)

View on Semantic Scholar

Summary

The paper introduces CORRECT, a GitHub code reviewer recommendation system that uses cross-project experience from external libraries and technology skills.
Empirical evaluation shows CORRECT achieves over 92% Top-5 accuracy on industrial data and outperforms state-of-the-art methods like RevFinder.
Designed for practical use, CORRECT integrates via a browser extension to automate expertise discovery and potentially reduce code review bottlenecks.

Analysis of "CORRECT: Code Reviewer Recommendation at GitHub for Vendasta Technologies" (1807.04130)

This paper introduces CORRECT, a code reviewer recommendation system integrated within GitHub, specifically targeting Vendasta Technologies but designed for broader practical use. The primary innovation lies in leveraging both cross-project experience—measured through external library usage—and specialized technology experience to assess and rank potential reviewers for pull requests (PRs). This approach is situated in contrast to prior systems which predominantly rely on file path similarity or code review history limited within a single project scope.

Methodological Overview

CORRECT adopts a client-server architecture, combining a Chrome browser extension for user interaction and a backend web service responsible for analysis and recommendation. The workflow consists of the following steps:

Static Analysis of Submitted Pull Request: On PR initiation, the tool statically parses the changed source files to extract external libraries and specialized technologies referenced.
Mining Past Pull Requests: It retrieves the last 30 closed PRs and their reviewers, extracting analogous feature sets (libraries, technologies) from their changed files.
Similarity Assessment: For each past PR, the cosine similarity with the current PR is computed over the feature vectors comprising libraries and technology terms.
Expertise Propagation and Reviewer Ranking: Each reviewer from similar past PRs accrues a relevance score weighted by the similarity metric. Summing across relevant PRs, CORRECT produces a ranked list of reviewer candidates, with the top five typically recommended.
User Interaction: Recommendations appear directly within the GitHub PR workflow, facilitating immediate selection or further copy-paste into PR review requests.

This process is illustrated with a diagram in the paper (Figure: "Working methodology of CORRECT"), and a practical use case demonstrates its end-to-end application during a real PR submission.

Evaluation and Empirical Results

The authors conduct a comprehensive empirical evaluation:

Industrial Data: Using 13,081 PRs across 10 Vendasta repositories, CORRECT achieves a Top-5 accuracy of 92.15% (i.e., the actual reviewer is among the top 5 recommendations in 92.15% of cases). Mean Reciprocal Rank (MRR) is 0.67, with precision and recall at 85.93% and 81.39%, respectively.
Comparison to State-of-the-Art: Against the leading RevFinder tool, CORRECT secures an 11.43% Top-5 accuracy improvement and about 10% better precision and recall, with statistical significance across the examined datasets.
Generality: In experiments on 4,034 PRs spanning six open source projects in Java, Python, and Ruby, CORRECT sustains strong performance (Top-5 accuracy: 85.2%, MRR: 0.69) and shows no significant bias toward project language or type.
Statistical Rigor: The improvements are validated using MWU, Cohen's d, and Glass Δ, confirming the robustness of the observed gains.

Notable Claims

Cross-project and Technology Experience: CORRECT claims that leveraging cross-project experience via external library analysis, as well as explicit handling of specialized technology skills, yields more effective and robust reviewer recommendations compared to mechanisms centered on file path or directory similarity.
Industrial Relevance: The method is asserted to be directly applicable in both open source and closed source, industrial environments without language-specific tuning.
Practicality: A deployment as a Chrome browser extension enables seamless developer adoption within existing workflows, minimizing context switching and associated cognitive overhead.

Implementation Considerations

For implementation in production environments:

API Constraints: CORRECT depends on GitHub API access for PR and review history, mitigated by OAuth-based individual user authentication to circumvent API rate limiting in organizational settings.
Performance: Average recommendation latency is 10–15 seconds per PR, with further optimization via multi-threaded API access and client-side result caching.
Extensibility: While the initial plug-in targets Chrome, the REST API backend could be readily consumed by extensions for other browsers or integrated with continuous integration pipelines.

Trade-offs and Limitations

History Length Trade-off: The reliance on the 30 most recent closed PRs balances recency and computational overhead, but may overlook longer-term or less frequently occurring expertise.
Cold Start: New projects or those with little review history may yield less accurate recommendations until sufficient data accrues.
Domain Adaptation: While experiments indicate language and domain robustness, projects with atypical file structures or unconventional dependency usage may require custom adaptation.

Implications and Future Directions

Practical Implications

Automated Expertise Discovery: CORRECT systematically extracts latent expertise across project boundaries, aiding both novice and experienced developers in reviewer assignment.
Scalability and Integration: Its modular architecture accommodates organizational scaling and future integration into diverse developer tools and platforms.
Reduction of Review Bottlenecks: By optimizing reviewer selection, CORRECT may reduce review latency and improve code quality, particularly in organizations encouraging internal code reuse and rapid technology adoption.

Theoretical Implications

Feature Expansion: The strong empirical gains of library and technology-centered similarity metrics suggest further investigation into richer, multi-modal expertise features (e.g., dependency graphs, API call patterns).
Dynamic Learning: Incorporating adaptive learning from post-review outcomes, such as reviewer feedback or review approval rates, could further refine recommendation relevance.

Speculation on Future Developments

Generalization to Other Collaboration Platforms: CORRECT's methodological framework could extend to Bitbucket, GitLab, or enterprise code review systems with analogous revision and review histories.
Integration with LLM-based Code Understanding: Leveraging code embeddings or LLMs may facilitate more nuanced expertise and code similarity assessment, moving beyond static token-based features.
Contextual and Social Signals: Enriching recommendations with developer availability, workload, or prior collaborative efficacy may yield further organizational efficiency gains.

Conclusion

CORRECT presents a systematically validated, practically-deployable approach to reviewer recommendation that addresses limitations of prior art through explicit modeling of cross-project and specialized technology experience. Its architectural decisions, empirical rigor, and industrial applicability reinforce its value for both research and applied settings, establishing a new baseline for automated reviewer suggestion in collaborative software development workflows.

PDF Markdown