Detecting Refactoring Commits in Machine Learning Python Projects: A Machine Learning-Based Approach (2404.06572v1)
Abstract: Refactoring enhances software quality without altering its functional behaviors. Understanding the refactoring activities of developers is crucial to improving software maintainability. With the increasing use of ML libraries and frameworks, maximizing their maintainability is crucial. Due to the data-driven nature of ML projects, they often undergo different refactoring operations (e.g., data manipulation), for which existing refactoring tools lack ML-specific detection capabilities. Furthermore, a large number of ML libraries are written in Python, which has limited tools for refactoring detection. PyRef, a rule-based and state-of-the-art tool for Python refactoring detection, can identify 11 types of refactoring operations. In comparison, Rminer can detect 99 types of refactoring for Java projects. We introduce MLRefScanner, a prototype tool that applies machine-learning techniques to detect refactoring commits in ML Python projects. MLRefScanner identifies commits with both ML-specific and general refactoring operations. Evaluating MLRefScanner on 199 ML projects demonstrates its superior performance compared to state-of-the-art approaches, achieving an overall 94% precision and 82% recall. Combining it with PyRef further boosts performance to 95% precision and 99% recall. Our study highlights the potential of ML-driven approaches in detecting refactoring across diverse programming languages and technical domains, addressing the limitations of rule-based detection methods.
- 2022. https://support.scitools.com/support/solutions/articles/70000582223-what-metrics-does-understand-have-
- 2023. https://www.tiobe.com/tiobe-index/
- 2023. https://www.python.org/doc/essays/blurb/
- Sabah A Abdulkareem and Ali J Abboud. 2021. Evaluating python, c++, javascript and java programming languages based on software complexity calculator (halstead metrics). In IOP Conference Series: Materials Science and Engineering, Vol. 1076. IOP Publishing, 012046.
- Jehad Al Dallal and Anas Abdin. 2017. Empirical evaluation of the impact of object-oriented code refactoring on quality attributes: A systematic literature review. IEEE Transactions on Software Engineering 44, 1 (2017), 44–69.
- Can refactoring be self-affirmed? an exploratory study on how developers document their refactoring activities in commit messages. In 2019 IEEE/ACM 3rd International Workshop on Refactoring (IWoR). IEEE, 51–58.
- Refactoring practices in the context of modern code review: An industrial case study at Xerox. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 348–357.
- Toward the automatic classification of self-affirmed refactoring. Journal of Systems and Software 171 (2021), 110821.
- Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 291–300.
- Arooj Arif and Zeeshan Ali Rana. 2020. Refactoring of code to remove technical debt and reduce maintenance effort. In 2020 14th International Conference on Open Source Systems and Technologies (ICOSST). IEEE, 1–7.
- Guisella A Armijo and Valter V de Camargo. 2022. Refactoring Recommendations with Machine Learning. In Anais Estendidos do XXI Simpósio Brasileiro de Qualidade de Software. SBC, 15–22.
- PyRef: refactoring detection in Python projects. In 2021 IEEE 21st international working conference on source code analysis and manipulation (SCAM). IEEE, 136–141.
- Vimala Balakrishnan and Ethel Lloyd-Yemoh. 2014. Stemming and lemmatization: A comparison of retrieval performances. (2014).
- A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD explorations newsletter 6, 1 (2004), 20–29.
- James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of machine learning research 13, 2 (2012).
- Understanding the impact of refactoring on smells: A longitudinal study of 23 software projects. In Proceedings of the 2017 11th Joint Meeting on foundations of Software Engineering. 465–475.
- Jacob Cohen. 1988. Statistical Power Analysis for the Behavioral Sciences (2 ed.). Routledge, New York. https://doi.org/10.4324/9780203771587
- Understanding development process of machine learning systems: Challenges and solutions. In 2019 acm/ieee international symposium on empirical software engineering and measurement (esem). IEEE, 1–6.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- A preliminary investigation of self-admitted refactorings in open source software (S). In International Conferences on Software Engineering and Knowledge Engineering, Vol. 2018. KSI Research Inc. and Knowledge Systems Institute Graduate School, 165–168.
- Understanding Software-2.0: A Study of Machine Learning library usage and evolution. ACM Transactions on Software Engineering and Methodology (TOSEM) 30, 4 (2021), 1–42.
- Discovering repetitive code changes in Python ML systems. In Proceedings of the 44th International Conference on Software Engineering. 736–748.
- Euclidean distance matrices: essential theory, algorithms, and applications. IEEE Signal Processing Magazine 32, 6 (2015), 12–30.
- C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In Workshop on learning from imbalanced datasets II, Vol. 11. 1–8.
- Rudolph Flesch. 1948. A new readability yardstick. Journal of applied psychology 32, 3 (1948), 221.
- Addressing Classification on Highly Imbalanced Clinical Datasets. In International Conference on Computational Advances in Bio and Medical Sciences. Springer, 103–114.
- Martin Fowler. 2018. Refactoring. Addison-Wesley Professional.
- Refactoring: improving the design of existing code. addison.
- Review and comparative analysis of machine learning libraries for machine learning. Discrete and Continuous Models and Applied Computational Science 27, 4 (2019), 305–315.
- Machine learning approach based on hybrid features for detection of phishing URLs. In 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence). IEEE, 954–959.
- Abdelwahab Hamou-Lhadj. 2008. Measuring the complexity of traces using shannon entropy. In Fifth International Conference on Information Technology: New Generations (ITNG 2008). IEEE, 489–494.
- David J Hand. 2012. Assessing the performance of classification methods. International Statistical Review 80, 3 (2012), 400–414.
- Frank E Harrell et al. 2001. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Vol. 608. Springer.
- Michiel Hazewinkel. 2001. Minimax principle, Encyclopaedia of mathematics.
- Bagging and boosting ensemble classifiers for classification of multispectral, hyperspectral and PolSAR data: a comparative evaluation. Remote Sensing 13, 21 (2021), 4405.
- A study of redundant metrics in defect prediction datasets. In 2016 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE, 51–52.
- Comparative analysis of Python and Java for beginners. Int. Res. J. Eng. Technol 7, 8 (2020), 4384–4407.
- An empirical study of refactoringchallenges and benefits at microsoft. IEEE Transactions on Software Engineering 40, 7 (2014), 633–649.
- Handling imbalanced datasets: A review. GESTS international transactions on computer science and engineering 30, 1 (2006), 25–36.
- Hans Peter Luhn. 1957. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of research and development 1, 4 (1957), 309–317.
- Inderjeet Mani and I Zhang. 2003. kNN approach to unbalanced data distributions: a case study involving information extraction. In Proceedings of workshop on learning from imbalanced datasets, Vol. 126. ICML, 1–7.
- Mary L McHugh. 2012. Interrater reliability: the kappa statistic. Biochemia medica 22, 3 (2012), 276–282.
- Trustworthiness models to categorize and prioritize code for security improvement. Journal of Systems and Software 198 (2023), 111621.
- The Use of Python in the field of Artifical Intelligence. In International Conference on Information Technology and Development of Education–ITRO.
- Linguistic regularities in continuous space word representations. In Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies. 746–751.
- Jeremy Miles. 2005. R-squared, adjusted R-squared. Encyclopedia of statistics in behavioral science (2005).
- Satwik Mishra. 2017. Handling imbalanced data: SMOTE vs. random undersampling. Int. Res. J. Eng. Technol 4, 8 (2017), 317–320.
- Refdetect: A multi-language refactoring detection tool based on string alignment. IEEE Access 9 (2021), 86698–86727.
- How important is data quality? Best classifiers vs best features. Neurocomputing 470 (2022), 365–375.
- Studying the impact of dependency network measures on software quality. In 2010 IEEE International Conference on Software Maintenance. IEEE, 1–10.
- Too many user-reviews, what should app developers look at first? IEEE Transactions on Software Engineering (2019).
- Feature requests-based recommendation of software refactorings. Empirical Software Engineering 25 (2020), 4315–4347.
- Studying the practices of deploying machine learning projects on docker. In Proceedings of the 26th International Conference on Evaluation and Assessment in Software Engineering. 190–200.
- Feature selection methods for text classification: a systematic literature review. Artificial Intelligence Review 54, 8 (2021), 6149–6200.
- Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence. Information 11, 4 (2020), 193.
- Mining software evolution to predict refactoring. In First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007). IEEE, 354–363.
- ” Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.
- Lior Rokach. 2010. Ensemble-based classifiers. Artificial intelligence review 33 (2010), 1–39.
- Comparing commit messages and source code metrics for the prediction refactoring activities. Algorithms 14, 10 (2021), 289.
- Claude Elwood Shannon. 1948. A mathematical theory of communication. The Bell system technical journal 27, 3 (1948), 379–423.
- Mosabbir Khan Shiblu. 2022. JsDiffer: Refactoring Detection in JavaScript. Ph. D. Dissertation. Concordia University Montréal, Québec, Canada.
- Refdiff 2.0: A multi-language refactoring detection tool. IEEE Transactions on Software Engineering 47, 12 (2020), 2786–2802.
- Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation 28, 1 (1972), 11–21.
- KR Srinath. 2017. Python–the fastest growing programming language. International Research Journal of Engineering and Technology 4, 12 (2017), 354–357.
- Konstantinos Stroggylos and Diomidis Spinellis. 2007. Refactoring–does it improve software quality?. In Fifth International Workshop on Software Quality (WoSQ’07: ICSE Workshops 2007). IEEE, 10–10.
- Reuben Thomas. [n. d.]. https://abiword.github.io/enchant/
- RefactoringMiner 2.0. IEEE Transactions on Software Engineering 48, 3 (2020), 930–950.
- Accurate and efficient refactoring detection in commit history. In Proceedings of the 40th international conference on software engineering. 483–494.
- Python Code Smell Detection Using Machine Learning. In 2022 26th International Computer Science and Engineering Conference (ICSEC). IEEE, 128–133.
- How does machine learning change software development practices? IEEE Transactions on Software Engineering 47, 9 (2019), 1857–1871.
- Investigating the changes in software metrics after vulnerability is fixed. In 2021 IEEE International Conference on Big Data (Big Data). IEEE, 5658–5663.