Quantifying and Characterizing Clones of Self-Admitted Technical Debt in Build Systems (2402.08920v1)
Abstract: Self-Admitted Technical Debt (SATD) annotates development decisions that intentionally exchange long-term software artifact quality for short-term goals. Recent work explores the existence of SATD clones (duplicate or near duplicate SATD comments) in source code. Cloning of SATD in build systems (e.g., CMake and Maven) may propagate suboptimal design choices, threatening qualities of the build system that stakeholders rely upon (e.g., maintainability, reliability, repeatability). Hence, we conduct a large-scale study on 50,608 SATD comments extracted from Autotools, CMake, Maven, and Ant build systems to investigate the prevalence of SATD clones and to characterize their incidences. We observe that: (i) prior work suggests that 41-65% of SATD comments in source code are clones, but in our studied build system context, the rates range from 62% to 95%, suggesting that SATD clones are a more prevalent phenomenon in build systems than in source code; (ii) statements surrounding SATD clones are highly similar, with 76% of occurrences having similarity scores greater than 0.8; (iii) a quarter of SATD clones are introduced by the author of the original SATD statements; and (iv) among the most commonly cloned SATD comments, external factors (e.g., platform and tool configuration) are the most frequent locations, limitations in tools and libraries are the most frequent causes, and developers often copy SATD comments that describe issues to be fixed later. Our work presents the first step toward systematically understanding SATD clones in build systems and opens up avenues for future work, such as distinguishing different SATD clone behavior, as well as designing an automated recommendation system for repaying SATD effectively based on resolved clones.
- Cliff N (1993) Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin 114:494–509
- Cunningham W (1992) The wycash portfolio management system. SIGPLAN OOPS Mess 4:29–30
- Eisenhardt KM (1989) Building theories from case study research. Academy of Management Review 14:532–550
- Gallaba K, McIntosh S (2018) Use and misuse of continuous integration features: An empirical study of projects that (mis) use travis ci. IEEE Transactions on Software Engineering 46:33–50
- Juergens E (2011) Research in cloning beyond code: a first roadmap. In: Proceedings of the 5th International Workshop on Software Clones, pp 67–68
- Koschke R (2007) Survey of research on software clones. In: Dagstuhl Seminar Proceedings
- Maipradit R, Treude C, Hata H, Matsumoto K (2020b) Wait for it: identifying “on-hold” self-admitted technical debt. Empirical Software Engineering 25:3770–3798
- Mann HB, Whitney DR (1947) The Annals of Mathematical Statistics 18:50–60
- Rousseeuw PJ (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20:53–65
- Roy CK, Cordy JR (2007) A survey on software clone detection research. Queen’s School of Computing Technical Report 541:64–68
- Scikit-Learn library (2023a) Countvectorizer. URL https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
- Scikit-Learn library (2023b) Dbscan. URL https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html
- Smith P (2011) Software build systems: principles and experience. Addison-Wesley Professional
- Vidoni M (2021) Self-admitted technical debt in r packages: An exploratory study. In: Proceedings of the 18th IEEE/ACM International Conference on Mining Software Repositories, pp 179–189