RefactoringMiner: Code Refactoring Detection
- RefactoringMiner is an open-source tool suite offering state-of-the-art automated detection of refactorings in source code version histories.
- It employs a four-phase detection pipeline with top-down and bottom-up AST differencing, multi-mapping, and semantic compatibility checks.
- Empirical benchmarks demonstrate high precision and recall in both Java and C++, supporting applications in code review, program repair, and API migration.
RefactoringMiner is a state-of-the-art open-source tool suite for the automated detection of refactorings in source code version histories, primarily supporting Java and, in its extended form RefactoringMiner++, C++. The tool is distinguished by its refactoring-aware @@@@1@@@@ (AST) differencing algorithms, multi-mapping support, semantic compatibility checks, and evaluation methodology, which collectively provide a robust foundation for empirical studies, code review automation, and program analysis at scale (Ottenhof et al., 28 Jan 2026, Ritz et al., 24 Feb 2025, Alikhanifard et al., 2024).
1. Detection Principles and Core Algorithms
RefactoringMiner operates by applying a four-phase detection pipeline over successive versions of source code artifacts:
- Phase 1: Top-Down Matching: Entity-level matching of types, methods, and fields is performed when signatures are identical, propagating this process recursively along inheritance hierarchies.
- Phase 2: Bottom-Up Matching: For unmatched entities, all pairs of candidate elements (e.g., methods, fields) are considered. Statement-level mapping calculates matches between leaf and composite statements, enabling detection of renames, signature modifications, merges/splits, and extract/inline operations. This phase tracks replacements to infer fine-grained body-level refactorings.
- Phase 3: Class-Level Refactoring Detection: Unmatched types are paired based on member signature intersections, supporting class renames, moves, merges/splits, and extractions. Subsequent entity matching recurses on these pairs.
- Phase 4: Inter-File Refactoring Detection: Remaining unmatched entities are compared across the entire codebase to detect PULL-UP/PUSH-DOWN, MOVE, EXTRACT CLASS/SUPERCLASS, and inter-file moves.
Matching is formalized using bipartite graph construction over AST nodes, employing normalized Levenshtein-based name similarity: Node pairs exceeding similarity thresholds are greedily matched under one-to-one constraints.
Composite structural similarity is quantified for refactorings such as Move Class and Extract Method by evaluating the largest shared subtree between entities: where a threshold of indicates valid pairing.
Detected edit scripts are classified into over 40 (Java) to 100+ (C++) refactoring patterns using rule-based templates for changes (e.g., annotation additions, parameter renames).
2. Semantic, Multi-Mapping, and Refactoring-Aware Enhancements
RefactoringMiner 3.0 introduces several algorithmic advances (Alikhanifard et al., 2024):
- Multi-Mapping: Explicit handling of one-to-many and many-to-one statement mappings, critical for tracking duplicated code elimination (e.g., merged switch-case branches).
- Semantic Compatibility: Matches between program elements are constrained to ensure semantic coherence (e.g., method parameters matched only to corresponding parameter nodes). Top-down and subtree matching block incompatible SimpleName mappings.
- Refactoring Awareness: Import statement diffing is correlated with detected structural refactorings (e.g., MOVEs/RENAMES of classes lead to corresponding import modifications), improving the completeness of change representation.
- Statement Mapping Optimization: Redundant mappings (arising from extract/inline patterns) are de-duplicated based on a multi-criteria sorter considering edit distance, parent edit distance arrays, nesting depth, and positional indexes.
- Call Site Restriction: For EXTRACT/INLINE, mappings are governed by the call-site context, enhancing the granularity and accuracy of extraction tracking.
3. Architecture and Cross-Language Adaptation: RefactoringMiner++ for C++
RefactoringMiner++ generalizes the detection engine to C++ (Ritz et al., 24 Feb 2025) by:
- Parsing C++ code via libClang to produce complete ASTs.
- Translating Clang AST models into the unified in-memory entity model used by the Java-centric engine.
- Serializing models as JSON, enabling the Java engine to process them without core algorithmic modifications.
- Mapping C++ constructs (free functions/variables, templates, multiple inheritance, namespaces) into equivalent Java-style representations through artificial wrappers, UMLGeneralization lists, and generics analogs.
- Flagging non-refactoring, behavior-altering statement-level changes through a dedicated extension.
The core model-matching and AST differencing algorithms of RefactoringMiner remain unchanged, providing consistent refactoring detection strategies across both languages.
4. Empirical Performance, Benchmarking, and Evaluation
Extensive benchmarking demonstrates RefactoringMiner’s accuracy and performance:
- Precision, Recall, F1:
- Java (RefactoringMiner 2.0): Precision = 97.96%, Recall = 87.20%, F1 = 92.27% (Ottenhof et al., 28 Jan 2026).
- Java (RefactoringMiner 3.0): Precision = 99.8%, Recall = 99.6%, F1 = 99.7% for statement-level mappings (Alikhanifard et al., 2024).
- C++ (RefactoringMiner++): Precision = 1.00, Recall = 1.00 on seeded synthetic benchmarks (Ritz et al., 24 Feb 2025).
- Multi-Mappings: Only RefactoringMiner supports multi-mapping detection (Precision = 99.7%, Recall = 98.4%), whereas competitors recall < 11% (Alikhanifard et al., 2024).
- Semantic Violations: RefactoringMiner exhibits zero semantic mapping incompatibilities on key AST types, while other tools show 85–1288 violations.
- Inter-File Moves: RefactoringMiner uniquely detects inter-file mappings (Precision/Recall = 99.6%), with all others at recall = 0%.
- Execution Time: Median per-commit: RefactoringMiner 3.0 at 60.5 ms, competitive with alternatives (GumTree simple = 8.0 ms, MTDiff = 75.5 ms) (Alikhanifard et al., 2024).
Benchmarking leverages manually constructed ground-truth mappings over large public datasets (e.g., Defects4J: 800 bug-fix commits, Refactoring Oracle: 188 refactoring commits).
5. Practical Applications and Empirical Study Use Cases
RefactoringMiner is widely employed in studies of software engineering practices:
- Agentic vs. Human Refactoring: In large-scale evaluation of agent-generated pull requests in Java (Ottenhof et al., 28 Jan 2026), RefactoringMiner detects and classifies all refactorings over thousands of commits, enabling rigorous comparison of refactoring type distributions, volumes, and code quality impacts. Aggregate statistics show agents produce substantially more refactorings per commit (mean per commit for Claude Code = 762.73; developers = 15.34), with annotation changes dominating agentic refactorings.
- Code Review Augmentation: Refactoring-aware diffs increase reviewer efficiency and clarity by accurately grouping extracted blocks and structural changes (Alikhanifard et al., 2024).
- Automated Program Repair: High-fidelity AST mappings improve the mining and synthesis of bug-fix patterns while reducing mapping noise.
- API/Library Migration: Reliable detection of signature changes and extractions supports the synthesis of migration recipes.
- Educational and Industrial Integration: RefactoringMiner++ facilitates grading and review acceleration in C++-based contexts.
6. Limitations, Open Questions, and Ongoing Development
- Configuration: RefactoringMiner is typically applied with default heuristics and thresholds; customization is limited and rarely reported in empirical studies (Ottenhof et al., 28 Jan 2026).
- Manual Validation: Large dataset sizes render manual labeling impractical; reliance on RefactoringMiner’s published accuracy metrics is standard practice.
- Language Support: While core algorithms are language-independent, cross-language extension requires reimplementation of declaration extraction and refactoring rules (Ritz et al., 24 Feb 2025).
- C++ Challenges: Multi-file support, macro expansion, lambda/nested class handling, and template-heterogeneity are incomplete in RefactoringMiner++ (Ritz et al., 24 Feb 2025).
- Benchmarking: The RefactoringMiner 3.0 benchmark includes multi-mapping and semantic compatibility cases; further research is needed on robust handling of tangled commits and richer LLM-generated code change test suites.
- Visualization and Diff Interpretation: The development of grouped, refactoring-annotated diff visualizations and natural-language change explanations is an active research area enabled by RefactoringMiner’s mapping precision (Alikhanifard et al., 2024).
7. Comparative Table: RefactoringMiner vs. Competitors (Overall Statement Level, Java) (Alikhanifard et al., 2024)
| Tool | Precision | Recall | F1 |
|---|---|---|---|
| RefactoringMiner 3.0 | 99.8% | 99.6% | 99.7% |
| MTDiff | 97.1% | 91.6% | 94.3% |
| GumTree 3.0 simple | 97.2% | 91.3% | 94.2% |
| GumTree 3.0 greedy | 97.3% | 91.2% | 94.2% |
| IJM | 97.5% | 90.6% | 93.9% |
| GumTree 2.1.0 | 96.7% | 90.4% | 93.5% |
RefactoringMiner achieves the highest empirical accuracy in statement-level mapping, uniquely providing multi-mapping support and semantic compatibility guarantees. All tool source code and benchmarks are publicly accessible, facilitating replicable advances in refactoring detection research.