Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
91 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
o3 Pro
5 tokens/sec
GPT-4.1 Pro
15 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
Gemini 2.5 Flash Deprecated
12 tokens/sec
2000 character limit reached

PHP code smells in web apps: survival and anomalies (2101.00090v1)

Published 31 Dec 2020 in cs.SE

Abstract: Context: Code smells are considered symptoms of poor design, leading to future problems, such as reduced maintainability. Except for anecdotal cases (e. g. code dropout), a code smell survives until it gets explicitly refactored or removed. This paper presents a longitudinal study on the survival of code smells for web apps built with PHP. Objectives: RQ: (i) code smells survival depends on their scope? (ii) practitioners attitudes towards code smells removal in web apps have changed throughout time? (iii) how long code smells survive in web applications? (iv) are there sudden variations (anomalies) in the density of code smells through the evolution of web apps? Method: We analyze the evolution of 6 code smells in 8 web applications written in PHP at the server side, across several years, using the survival analysis technique. We classify code smells according to scope in two categories: scattered and localized. Scattered code smells are expected to be more harmful since their influence is not circumscribed as in localized code smells. We split the observations for each web app into two equal and consecutive timeframes, to test the hypothesis that code smells awareness has increased throughout time. As for the anomalies, we standardize their detection criteria. Results: We present some evidence that code smells survival depends on their scope: the average survival rate decreases in some of them, while the opposite is observed for the remainder. The survival of localized code smells is around 4 years, while the scattered ones live around 5 years. Around 60% of the smells are removed, and some live through all the application life. We also show how a graphical representation of anomalies found in the evolution of code smells allows unveiling the story of a development project and make managers aware of the need for enforcing regular refactoring practices.

Citations (5)

Summary

  • The paper presents a longitudinal study using survival analysis to compare the removal rates of localized and scattered PHP code smells.
  • It applies an automated workflow with tools like PHPMD and R packages to measure code smell longevity and detect anomalies in smell density.
  • The study highlights the need for proactive refactoring, as about 40% of code smells persist over years, indicating persistent technical debt.

This paper (2101.00090) presents a longitudinal paper investigating the survival and evolution patterns of code smells in PHP web applications. It aims to provide practical insights for developers and managers regarding code quality and technical debt management in the context of PHP, a widely used server-side language for web development.

Methodology & Implementation:

  1. Application & Smell Selection:
    • Applications: 8 open-source PHP web applications (PhpMyAdmin, DokuWiki, OpenCart, phpBB, PhpPgAdmin, MediaWiki, PrestaShop, Vanilla) were selected based on criteria like being open-source, using Object-Oriented PHP, having a long history (minimum 5 years), and being complete applications (not libraries or frameworks).
    • Code Smells: 6 specific code smells were chosen, representing both localized (within a class/method) and scattered (across multiple classes) scopes:
      • Localized: ExcessiveMethodLength (Long Method), ExcessiveClassLength (God Class), ExcessiveParameterList (Long Parameter List).
      • Scattered: DepthOfInheritance, CouplingBetweenObjects, NumberOfChildren.
    • Detection Tool: PHPMD (PHP Mess Detector) was used with its default thresholds to detect these smells.
  2. Data Collection & Preparation Workflow:
    • A largely automated workflow was implemented:
      • Download source code versions from repositories (GitHub, SourceForge, etc.).
      • Extract version timestamps.
      • Run PHPMD on each version to detect and locate code smells, storing results in XML.
      • Parse XML results and store smell data (location, version detected) in a MySQL database using a custom PHP script (CodeSmells2DB).
      • Merge timestamp data with smell data.
      • Transform the version-by-version smell data into a survival analysis format using another custom PHP script (CSLong2CSSurv). This script determines the introduction and removal version/date for each smell instance.
      • Calculate survival times and censoring status (0=still present, 1=removed).
      • Extract code size metrics (LOC, LLOC, Classes) using PHPLOC for each version.
      • Export data to CSV for analysis in R.
    • The collected dataset is publicly available for replication purposes.
  3. Analysis Techniques:
    • Survival Analysis: Kaplan-Meier estimators were used to estimate survival probabilities (how likely a smell is to persist past time t). Log-rank tests were used to compare survival curves between groups (localized vs. scattered smells; earlier vs. later timeframes). The R packages survival, survminer, and dplyr were utilized.
    • Anomaly Detection: A method was developed to detect sudden changes (anomalies) in code smell density over time:
      • Calculate code smell density for version i: $pcs_i = \frac{\text{# Code Smells}_i}{\text{LLOC}_i}$
      • Calculate the relative rate of change in density between versions: Δpcs=pcsipcsi1pcsi1\Delta pcs = \frac{pcs_i - pcs_{i-1}}{pcs_{i-1}}
      • Visualize this rate of change over time/versions and use thresholds (e.g., +/- 50%, +100%) to flag significant increases or decreases.

Key Findings & Practical Implications:

  1. Smell Scope Matters (RQ1):
    • Localized smells have a statistically significantly shorter lifespan (median ~4 years) compared to scattered smells (median ~5 years) in most applications studied.
    • Localized smells are removed more frequently (~62% removal rate) than scattered smells (~39% removal rate).
    • Implication: Scattered smells represent more persistent technical debt. Removing them is harder (potentially due to lack of tooling and higher effort) and happens less often. Teams may need to explicitly allocate resources or find better strategies/tools to address scattered smells like high coupling or deep inheritance.
  2. Time Trends Vary (RQ2):
    • There isn't a universal trend of faster smell removal in more recent years across all projects. While some projects showed decreased survival times in the latter half of their history (suggesting increased awareness or dedicated refactoring), others showed the opposite or mixed results.
    • Increasing software complexity (measured by class count growth, aligning with Lehman's Laws) acts as an inertia factor, potentially making refactoring harder over time.
    • Implication: Simply assuming teams get better at managing smells over time due to general awareness is unreliable. Project-specific factors (team practices, complexity growth, dedicated refactoring efforts) dominate. Continuous monitoring and proactive refactoring are necessary.
  3. Smell Lifespan (RQ3):
    • On average across all apps and smells, the median survival time is around 4 years, with about 60% of smells eventually being removed. However, a significant portion (40%) persists, potentially for the entire observed lifetime of the application.
    • Survival times vary considerably between applications, likely reflecting different development practices and priorities.
    • Implication: Code smells introduced are likely to remain for years unless actively addressed. This reinforces the cost of delaying refactoring (technical debt interest).
  4. Anomaly Detection is Possible (RQ4):
    • Monitoring the relative change in code smell density (smells per LLOC) effectively highlights versions with unusual increases (potential quality degradation, large feature additions without cleanup) or decreases (significant refactoring efforts).
    • This method can help distinguish genuine refactoring from apparent smell removal caused by code restructuring (like file renaming).
    • Implication: This anomaly detection technique is practical for integration into CI/CD pipelines. By setting thresholds on the rate of change in smell density, teams can get automated, just-in-time warnings about potential quality issues or confirm the positive impact of refactoring in new releases.

Practical Applications:

  • Code Quality Monitoring: Use tools like PHPMD to regularly detect smells in PHP projects.
  • Refactoring Strategy: Prioritize addressing long-lived, scattered smells, as they tend to persist. Use survival data (like the ~5-year median for scattered smells) to justify allocation of refactoring resources.
  • CI/CD Integration: Implement the proposed anomaly detection (tracking relative change in CS_count / LLOC) in build pipelines to flag releases with sharp quality degradation or verify refactoring effectiveness.
  • Technical Debt Management: Use the paper's findings (survival times, removal rates) and the anomaly detection method to quantitatively track and manage technical debt related to code smells.
  • Benchmarking: Compare a project's smell survival rates and density trends against the averages reported in the paper to gauge relative code health.