When Code Smells Meet ML: On the Lifecycle of ML-specific Code Smells in ML-enabled Systems (2403.08311v1)
Abstract: Context. The adoption of Machine Learning (ML)--enabled systems is steadily increasing. Nevertheless, there is a shortage of ML-specific quality assurance approaches, possibly because of the limited knowledge of how quality-related concerns emerge and evolve in ML-enabled systems. Objective. We aim to investigate the emergence and evolution of specific types of quality-related concerns known as ML-specific code smells, i.e., sub-optimal implementation solutions applied on ML pipelines that may significantly decrease both the quality and maintainability of ML-enabled systems. More specifically, we present a plan to study ML-specific code smells by empirically analyzing (i) their prevalence in real ML-enabled systems, (ii) how they are introduced and removed, and (iii) their survivability. Method. We will conduct an exploratory study, mining a large dataset of ML-enabled systems and analyzing over 400k commits about 337 projects. We will track and inspect the introduction and evolution of ML smells through CodeSmile, a novel ML smell detector that we will build to enable our investigation and to detect ML-specific code smells.
- The goal question metric approach. Encyclopedia of software engineering (1994), 528–532.
- On the Diffusion and Impact of Code Smells in Web Applications. In Services Computing – SCC 2020, Qingyang Wang, Yunni Xia, Sangeetha Seshadri, and Liang-Jie Zhang (Eds.). Springer International Publishing, Cham, 67–84.
- Prevalence of Code Smells in Reinforcement Learning Projects. arXiv:2303.10236 [cs.SE]
- Norman Cliff. 1993. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin 114 (1993), 494–509. https://api.semanticscholar.org/CorpusID:120113824
- William Jay Conover. 1999. Practical nonparametric statistics. Vol. 350. john wiley & sons.
- Metrics for Code Smells of ML Pipelines. In Product-Focused Software Process Improvement, Regine Kadgien, Andreas Jedlitschka, Andrea Janes, Valentina Lenarduzzi, and Xiaozhou Li (Eds.). Springer Nature Switzerland, Cham, 3–9.
- Ward Cunningham. 1992. The WyCash portfolio management system. ACM Sigplan Oops Messenger 4, 2 (1992), 29–30.
- Martin Fowler and Kent Beck. 1997. Refactoring: Improving the design of existing code. In 11th European Conference. Jyväskylä, Finland.
- Understanding Developer Practices and Code Smells Diffusion in AI-Enabled Software: A Preliminary Study. (2021).
- On the Evolution of Inheritance and Delegation Mechanisms and Their Impact on Code Quality. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 947–958.
- An exploratory study of the impact of antipatterns on class change-and fault-proneness. Empirical Software Engineering 17 (2012), 243–275.
- Software quality for ai: Where we are now?. In Software Quality: Future Perspectives on Software Engineering Quality: 13th International Conference, SWQD 2021, Vienna, Austria, January 19–21, 2021, Proceedings 13. Springer, 43–53.
- Software engineering for AI-based systems: a survey. ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 2 (2022), 1–59.
- On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation. In Proceedings of the 40th International Conference on Software Engineering. 482–482.
- Do they really smell bad? a study on developers’ perception of bad code smells. In 2014 IEEE International Conference on Software Maintenance and Evolution. IEEE, 101–110.
- When Code Smells Meet ML: On the Lifecycle of ML-specific Code Smells in ML-enabled Systems - Appendix. (2 2024). https://doi.org/10.6084/m9.figshare.25231817
- Wasiur Rhmann. 2021. Quantitative Software Change Prediction in Open Source Web Projects Using Time Series Forecasting. International Journal of Open Source Software and Processes (IJOSSP) 12, 2 (2021), 36–51.
- GitDelver enterprise dataset (GDED) an industrial closed-source dataset for socio-technical research. In Proceedings of the 19th International Conference on Mining Software Repositories. 403–407.
- Hidden technical debt in machine learning systems. Advances in neural information processing systems 28 (2015).
- Pydriller: Python framework for mining software repositories. In Proceedings of the 2018 26th ACM Joint meeting on european software engineering conference and symposium on the foundations of software engineering. 908–911.
- How developers perceive smells in source code: A replicated study. Information and Software Technology 92 (2017), 223–235.
- An empirical study of refactorings and technical debt in machine learning systems. In 2021 IEEE/ACM 43rd international conference on software engineering (ICSE). IEEE, 238–250.
- When and why your code starts to smell bad (and whether the smells go away). IEEE Transactions on Software Engineering 43, 11 (2017), 1063–1088.
- The prevalence of code smells in machine learning projects. In 2021 IEEE/ACM 1st Workshop on AI Engineering-Software Engineering for AI (WAIN). IEEE, 1–8.
- Bartosz Walter and Tarek Alkhaeir. 2016. The relationship between design patterns and code smells: An exploratory study. Information and Software Technology 74 (2016), 127–142.
- An Empirical Study on Numerical Bugs in Deep Learning Programs. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–5.
- NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python. In 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). 62–66.
- Experimentation in software engineering. Springer Science & Business Media.
- Code smells for machine learning applications. In Proceedings of the 1st International Conference on AI Engineering: Software Engineering for AI. 217–228.
- Examining the potentially confounding effect of class size on the associations between object-oriented metrics and change-proneness. IEEE Transactions on Software Engineering 35, 5 (2009), 607–623.