- The paper introduces a lightweight tool that integrates SBERT-based semantic embedding to detect linked code changes directly within Gerrit.
- It combines semantic similarity, file path metrics, and temporal features to outperform traditional baselines in Recall@K and MRR metrics.
- Experimental results on Qt, Android, and OpenStack show a significant reduction in review latency with sub-two-minute end-to-end detection.
SmartPatchLinker: An Open-Source Framework for Semantic Patch Linkage Detection in Code Review
Problem Statement and Motivations
Large-scale software ecosystems frequently encounter semantically related but independently submitted code changes, termed "linked changes." Failure to rapidly identify these linkages during code review leads to redundant implementations, conflict resolution overhead, and increased review latency. Prior approaches relying on static similarity heuristicsโTF-IDF on summaries and file path-based matchingโare ineffective in the face of semantic variation and alternate solution strategies. While LLM-based solutions offer improved representation power, their typical realization as heavyweight, server-side bots is poorly suited for seamless, privacy-sensitive, real-time interactions required by reviewers inside modern code review tools.
System Overview
SmartPatchLinker addresses these deficits via a browser-based tool that injects semantic patch linkage detection capabilities directly into the Gerrit code review interface. The core architecture decouples a lightweight Chrome extension UI from a local Python/Flask backend responsible for code analysis and inference.
Figure 1: SmartPatchLinker system architecture, showing seamless integration between Gerrit, the client-side Chrome extension, and the inference backend.
Upon page load, the extension automatically detects active Gerrit sessions, extracts patch context, and enables reviewers to configure analysis parameters such as temporal window and Top-K retrieval. The extracted context is securely communicated to the local backend, which executes candidate selection and similarity-based ranking. Predictions are rendered in situ, annotated with confidence indicators to support rapid reviewer judgment without the need to disrupt workflow or transfer sensitive data externally.
Figure 2: SmartPatchLinker's UI, showing reviewer interaction with time window, Top-K configuration, and result display within Gerrit's change view.
Model Architecture and Feature Engineering
SmartPatchLinkerโs backend leverages Sentence-BERT (all-MiniLM-L6-v2), enabling deep semantic embedding of patch titles and descriptions. For each candidate pair within the specified temporal window, the model extracts a feature vector comprising:
- SBERT-based semantic cosine similarity,
- File path similarity metrics (longest common prefix/suffix, Jaccard index on file lists),
- Temporal and structural meta-features (time delta, difference in file count).
Candidate selection is scoped by a reviewer-tunable window (default ฮด=14 days) to balance efficiency and recall. A Random Forest classifier, trained on labeled patch linkages, produces confidence scores. Top-K candidates are returned, prioritizing the most plausible semantic linkages.
Experimental Evaluation
SmartPatchLinker was evaluated on three major OSS ecosystems (Qt, Android, OpenStack), using datasets originated by Wang et al. [wang2021automatic]. The experiments involved comparing against three baseline variants: text-only, file-location-only, and their static combination.
Quantitative results reveal SmartPatchLinker's robust superiority in both Recall@K and MRR metrics, especially at low K values that directly impact interactive code review efficiency. For instance, in the Qt dataset with a 2-day window, SmartPatchLinker attains an MRR of 0.60, outperforming the best baseline at 0.52, and this margin persists across all projects and window settings.
Figure 3: Recall@K: SmartPatchLinker versus baselines, showing consistently higher recall across all K for Qt, Android, and OpenStack.
Notably, the model achieves high recall and optimal rank placement of relevant linkages even when lexical and path overlap is minimal, validating the benefit of semantic embedding. The systemโs real-time interactivity is evidenced by sub-two-minute end-to-end usage, encompassing detection, configuration, prediction, and inspection phases.
Workflow Integration and Usability
Reviewers access SmartPatchLinker via a Chrome extension popup, seamlessly configuring temporal and Top-K parameters before querying for linked changes. Results, including confidence and semantic badges, are displayed in the extension UI without leaving Gerrit.
Figure 4: Reviewer UI for setting the time window and Top-K retrieval count, tailoring results for immediate analysis.
Figure 5: Example Top-K results shown with percentage confidence, directly within the review session.
This workflow eliminates context-switching, supports dynamic exploration, and removes the administrative and privacy barriers typical of server-based or bot-style implementations.
Implications and Future Directions
SmartPatchLinker provides strong empirical evidence that semantic feature fusion and SBERT-based similarity yield substantive practical gains in early patch linkage detection. Its non-intrusive architecture and private local inference make it suited for organizations wary of code/data leakage. The significant recall improvements at low K suggest material reductions in duplicated effort and review latency for large engineering teams.
Possible future work includes cross-platform generalization (support for GitHub/GitLab), richer semantic reasoning across multi-branch and multi-repository linkages, and further integration with agentic or LLM-based assistants. Enhanced dependency summarization and alternative solution surfacing could further shift review dynamics toward more informed and context-aware decision-making. The practical fusion of real-time, privacy-preserving deployment with state-of-the-art semantic modeling positions SmartPatchLinker as a canonical approach for next-generation code review augmentation.
Conclusion
SmartPatchLinker advances the state of semantic patch linkage detection in code review through the fusion of SBERT-derived features and lightweight browser-native deployment. Its empirical results demonstrate marked superiority over traditional and hybrid baseline methods, particularly in scenarios with limited lexical or structural overlap. The tool's privacy-aware, workflow-preserving interaction model facilitates adoption in industrial review settings. Future extensions integrating LLM capabilities and supporting broader platforms present clear research and engineering opportunities for more contextually aware, automated code review support.
Reference: For full methodological details and supplementary resources, see "SmartPatchLinker: An Open-Source Tool to Linked Changes Detection for Code Review" (2604.04045).