Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Increasing, not Diminishing: Investigating the Returns of Highly Maintainable Code (2401.13407v1)

Published 24 Jan 2024 in cs.SE

Abstract: Understanding and effectively managing Technical Debt (TD) remains a vital challenge in software engineering. While many studies on code-level TD have been published, few illustrate the business impact of low-quality source code. In this study, we combine two publicly available datasets to study the association between code quality on the one hand, and defect count and implementation time on the other hand. We introduce a value-creation model, derived from regression analyses, to explore relative changes from a baseline. Our results show that the associations vary across different intervals of code quality. Furthermore, the value model suggests strong non-linearities at the extremes of the code quality spectrum. Most importantly, the model suggests amplified returns on investment in the upper end. We discuss the findings within the context of the "broken windows" theory and recommend organizations to diligently prevent the introduction of code smells in files with high churn. Finally, we argue that the value-creation model can be used to initiate discussions regarding the return on investment in refactoring efforts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Use and Misuse of the Term “Experiment” in Mining Software Repositories Research. IEEE Trans. on Software Engineering 48, 11 (2022), 4229–4248.
  2. Standardized Code Quality Benchmarking for Improving Software Maintainability. Software Quality Journal 2 (2012), 287–307.
  3. Accompanying Replication Package. https://zenodo.org/records/10560722
  4. U Owns the Code That Changes and How Marginal Owners Resolve Issues Slower in Low-Quality Source Code. In Proc. of the 27th Int’l. Conf. on Evaluation and Assessment in Software Engineering. 368–377. https://doi.org/10.1145/3593434.3593480
  5. Empirical Standards for Repository Mining. In Proc. of the 19th Int’l. Conf. on Mining Software Repositories. 142–143. https://doi.org/10.1145/3524842.3528032
  6. B. Efron. 1979. Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics 7, 1 (1979), 1–26. https://doi.org/10.1214/aos/1176344552
  7. David Farley. 2021. Modern Software Engineering: Doing What Works to Build Better Software Faster. Addison-Wesley Professional, Boston, MA, USA.
  8. Norman Fenton. 1994. Software Measurement: A Necessary Scientific Basis. IEEE Trans. on Software Engineering 20, 3 (1994), 199–206.
  9. Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations (1 ed.). IT Revolution Press, Portland, Oregon.
  10. The SPACE of Developer Productivity: There’s more to it than you think. Queue 19, 1 (2021), 10:20–10:48. https://doi.org/10.1145/3454122.3454124
  11. Code Ownership in Open-Source Software. In Proc. of the 18th Int’l. Conf. on Evaluation and Assessment in Software Engineering. 1–9. https://doi.org/10.1145/2601248.2601283
  12. Methods for Evaluating Causality in Observational Studies. Deutsches Arzteblatt Int’l. 117, 7 (2020), 101–107. https://doi.org/10.3238/arztebl.2020.0101
  13. Comparing Fine-grained Source Code Changes and Code Churn for Bug Prediction. In Proc. of the 8th Working Conf. on Mining Software Repositories. 83–92. https://dl.acm.org/doi/abs/10.1145/1985441.1985456
  14. A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Trans. on Software Engineering 38, 6 (2012), 1276–1304. https://doi.org/10.1109/TSE.2011.103
  15. It’s Not a Bug, It’s a Feature: How Misclassification Impacts Bug Prediction. In Proc. of the 35th Int’l. Conf. on Software Engineering. 392–401. https://doi.org/10.1109/ICSE.2013.6606585
  16. A Systematic Literature Review and Meta-Analysis on Cross Project Defect Prediction. IEEE Trans. on Software Engineering 45, 2 (2019), 111–147. https://doi.org/10.1109/TSE.2017.2770124
  17. Andrew Hunt and David Thomas. 1999. The Pragmatic Programmer: From Journeyman to Master (1 ed.). Addison-Wesley Professional, Boston, MA, USA.
  18. Helvio Jeronimo Junior and Guilherme Horta Travassos. 2022. Consolidating a Common Perspective on Technical Debt and its Management Through a Tertiary Study. Information and Software Technology 149 (2022), 106964.
  19. Robust Statistical Methods for Empirical Software Engineering. Empirical Software Engineering 22, 2 (2017), 579–630. https://doi.org/10.1007/s10664-016-9437-5
  20. Handling Estimation Uncertainty with Bootstrapping: Empirical Evaluation in the Context of Hybrid Prediction Methods. In Proc. of the Int’l. Symposium on Empirical Software Engineering and Measurement. 245–254. https://doi.org/10.1109/ESEM.2011.33
  21. An Investigation into the Functional Form of the Size-Defect Relationship for Software Modules. IEEE Trans. on Software Engineering 35, 2 (2009), 293–304. https://doi.org/10.1109/TSE.2008.90
  22. Code Smells and Refactoring: A Tertiary Systematic Review of Challenges and Observations. Journal of Systems and Software 167 (2020), 110610.
  23. A Systematic Literature Review on Technical Debt Prioritization: Strategies, Processes, Factors, and Tools. Journal of Systems and Software 171 (2021), 110827. https://doi.org/10.1016/j.jss.2020.110827
  24. Testing the Broken Windows Theory in the context of Technical Debt. https://doi.org/10.48550/arXiv.2209.01549
  25. Wei Li and Raed Shatnawi. 2007. An Empirical Study of the Bad Smells and Class Error Probability in the Post-Release Object-Oriented System Evolution. Journal of Systems and Software 80, 7 (2007), 1120–1128. https://doi.org/10.1016/j.jss.2006.10.018
  26. Source Code Metrics: A Systematic Mapping Study. Journal of Systems and Software 128 (2017), 164–197.
  27. The Evolution and Impact of Code Smells: A Case Study of Two Open Source Systems. In Proc. of the 3rd Int’l. Symposium on Empirical Software Engineering and Measurement. 390–400. https://doi.org/10.1109/ESEM.2009.5314231
  28. Judea Pearl. 2018. The Book of Why: The New Science of Cause and Effect (1 ed.). Ingram Publisher Services, La Vergne, TN, USA.
  29. Software Fault Prediction Metrics: A Systematic Literature Review. Information and Software Technology 55, 8 (2013), 1397–1418. https://doi.org/10.1016/j.infsof.2013.02.009
  30. A Systematic Review of Software Maintainability Prediction and Metrics. In Proc. of the 3rd Int’l. Symposium on Empirical Software Engineering and Measurement. 367–377.
  31. A Tertiary Study on Technical Debt: Types, Management Strategies, Research Trends, and Base Information for Practitioners. Information and Software Technology 102 (2018), 117–145.
  32. A Systematic Review on the Code Smell Effect. Journal of Systems and Software 144 (2018), 450–477. https://doi.org/10.1016/j.jss.2018.07.035
  33. Dag Sjoberg and Gunnar Rye Bergersen. 2023. Improving the Reporting of Threats to Construct Validity. In Proc. of the 27th Int’l. Conf. on Evaluation and Assessment in Software Engineering. 205–209. https://doi.org/10.1145/3593434.3593449
  34. Quantifying the Effect of Code Smells on Maintenance Effort. IEEE Trans. on Software Engineering 39, 8 (2013), 1144–1156. https://doi.org/10.1109/TSE.2012.89
  35. Do Code Smells Impact the Effort of Different Maintenance Programming Activities?. In Proc. of the 23rd Int’l. Conf. on Software Analysis, Evolution, and Reengineering, Vol. 1. 393–402. https://doi.org/10.1109/SANER.2016.103
  36. On the Relationship Between Story Points and Development Effort in Agile Open-Source Software. In Proc. of the 16th Int’l. Symposium on Empirical Software Engineering and Measurement. 183–194.
  37. Adam Tornhill. 2018. Software Design X-Rays: Fix Technical Debt with Behavioral Code Analysis. Pragmatic Bookshelf.
  38. Adam Tornhill and Markus Borg. 2022. Code Red: The Business Impact of Code Quality - A Quantitative Study of 39 Proprietary Production Codebases. In Proc. of the 5th Int’l. Conf. on Technical Debt. 11–20.
  39. Zeynep Tufekci. 2022. The Shameful Open Secret Behind Southwest’s Failure. The New York Times (Dec. 2022). https://www.nytimes.com/2022/12/31/opinion/southwest-airlines-computers.html
  40. James Wilson and George Kelling. 1982. Broken Windows: The Police and Neighborhood Safety. Atlantic (1982), 29–38.
  41. Aiko Yamashita and Leon Moonen. 2012. Do Code Smells Reflect Important Maintainability Aspects?. In Proc. of the 28th IEEE Int’l. Conf. on Software Maintenance. 306–315. https://doi.org/10.1109/ICSM.2012.6405287
  42. Investigating the Impact of Design Debt on Software Quality. In Proc. of the 2nd Workshop on Managing Technical Debt. 17–23. https://doi.org/10.1145/1985362.1985366
  43. An Empirical Study of the Impact of Bad Designs on Defect Proneness. In Proc. of the Int’l. Conf. on Software Analysis, Testing and Evolution. 1–9. https://doi.org/10.1109/SATE.2017.9
Citations (3)

Summary

  • The paper demonstrates that highly maintainable code yields increasing returns by significantly lowering defect counts at critical quality thresholds.
  • The analysis uses polynomial regression on data from 79 projects to reveal non-linear trends between code health and development time.
  • The study’s value-creation model challenges common assumptions by showing that continuous investment in code quality leads to amplified business benefits.

An Analytical Overview of "Increasing, not Diminishing: Investigating the Returns of Highly Maintainable Code"

The paper "Increasing, not Diminishing: Investigating the Returns of Highly Maintainable Code" by Borg et al. explores the implications of code maintainability on software defect rates and development time. The paper draws a substantial dataset from proprietary software projects to analyze the correlation between code quality, represented through Code Health (CH), and two critical operational aspects: defect count and the time involved in resolving issues.

Core Investigations and Methodology

The paper introduces a value-creation model originating from regression analyses which incorporate two publicly available datasets consisting of 79 proprietary software projects. The researchers employ the CodeScene tool to derive CH values, which categorize code quality into three intervals: healthy (CH ≥ 9), warning (4 ≤ CH < 9), and alert (CH < 4). This approach aims to provide a nuanced perspective on the non-linear associations between code quality and the key variables of defect count and implementation time.

To underpin the paper's assumptions, the authors investigate the association between CH and average defect count per file, as well as average Time-in-Development (Time-in-Dev). They adopt polynomial regression models to capture non-linearities in these associations, which are justified through the "broken windows" theory. This theory suggests that neglect in maintaining code quality can incite further deterioration and inefficiencies.

Key Findings

  1. Defect Count Trends: The analysis reveals a negative correlation between CH and defect count for low and high-quality intervals (CH ≤ 5 and CH ≥ 8, respectively). Interestingly, this correlation appears to weaken in the midrange of the CH spectrum (5 ≤ CH ≤ 8). This indicates that improving code quality from average to excellent results in a significant reduction in defects, thus emphasizing the importance of maintaining high-quality codebases.
  2. Time-in-Development Trends: A clear decline in Time-in-Dev as CH improves beyond 4 is evident, with higher variability observed at lower CH values. This suggests faster implementation times for issues associated with higher-quality code, reaffirming the productivity benefits of investing in code quality.
  3. Value-Creation Insights: The proposed value-creation model explores how different CH levels impact software development's business value. Remarkable non-linearities are observed, especially at the upper end of the CH spectrum. Enhancing code quality from high to very high offers amplified returns, contradicting the typical assumption of diminishing returns. This insight advocates for continuous investment in quality improvement, especially for critical code components.

Implications and Future Directions

The implications of these findings are substantial. Improving maintainability of high-importance files can directly influence a project's success by reducing defect rates and enhancing overall development efficiency. The results emphasize the necessity of developing zero-tolerance policies for code smells in high-churn files, supporting strategic resource allocation for refactoring efforts.

From a theoretical standpoint, the research contributes to understanding the non-linear nature of code quality returns, expanding on the broken windows theory within the software maintenance context. This prompts a reevaluation of strategies related to technical debt management and encourages further examination of code quality's business value.

In the field of AI and automated tools, this paper lays the groundwork for leveraging sophisticated analytical methods to optimize code quality interventions. It presents a case for integrating maintainability metrics into broader software development and maintenance frameworks.

To build on these insights, future research could explore more predictive and causal analyses, incorporating additional confounding factors such as file size and code coupling. Moreover, developing models tailored for organizational decision-making in technical debt trade-offs holds the potential to transform how software maintainability is approached in practice.