Test Smell Detection Tools: A Systematic Mapping Study (2104.14640v2)

Published 29 Apr 2021 in cs.SE

Abstract: Test smells are defined as sub-optimal design choices developers make when implementing test cases. Hence, similar to code smells, the research community has produced numerous test smell detection tools to investigate the impact of test smells on the quality and maintenance of test suites. However, little is known about the characteristics, type of smells, target language, and availability of these published tools. In this paper, we provide a detailed catalog of all known, peer-reviewed, test smell detection tools. We start with performing a comprehensive search of peer-reviewed scientific publications to construct a catalog of 22 tools. Then, we perform a comparative analysis to identify the smell types detected by each tool and other salient features that include programming language, testing framework support, detection strategy, and adoption, among others. From our findings, we discover tools that detect test smells in Java, Scala, Smalltalk, and C++ test suites, with Java support favored by most tools. These tools are available as command-line and IDE plugins, among others. Our analysis also shows that most tools overlap in detecting specific smell types, such as General Fixture. Further, we encounter four types of techniques these tools utilize to detect smells. We envision our study as a one-stop source for researchers and practitioners in determining the tool appropriate for their needs. Our findings also empower the community with information to guide future tool development.

Authors (9)

Wajdi Aljedaani (4 papers)
Anthony Peruma (31 papers)
Ahmed Aljohani (1 paper)
Mazen Alotaibi (5 papers)
Mohamed Wiem Mkaouer (42 papers)
Ali Ouni (36 papers)
Christian D. Newman (19 papers)
Abdullatif Ghallab (1 paper)
Stephanie Ludi (3 papers)

Citations (61)

View on Semantic Scholar

Summary

The paper presents a systematic mapping study that catalogs 22 test smell detection tools and identifies 66 unique test smell types.
It employs a rigorous methodology by filtering an initial 436 papers to 47 studies, detailing tools' features such as language support and detection techniques.
Findings reveal a strong focus on Java-based tools and highlight gaps in refactoring support, accuracy reporting, and tool availability for other languages.

This paper presents a Systematic Mapping Study (SMS) focused on identifying and characterizing tools designed to detect "test smells," which are suboptimal design choices in test code that can hinder maintainability and quality. The paper aims to provide a comprehensive catalog and analysis of available, peer-reviewed test smell detection tools, addressing the lack of consolidated information on their features, supported smells, target languages, and availability.

Research Questions:

RQ1: What test smell detection tools are available, and what common smell types do they support?
RQ2: What are the main characteristics of these tools (e.g., platform support, detection mechanisms)?

Methodology:

The authors followed a standard SMS process:

Planning: Defined search keywords (e.g., "test smell", "tool", "detect"), selected six digital libraries (ACM, IEEE Xplore, etc.), and established inclusion/exclusion criteria (peer-reviewed, English, proposing/using a tool, published before 2021).
Execution: An initial search yielded 436 papers. After filtering duplicates, applying criteria to titles/abstracts, performing full-text analysis, and conducting forward/backward snowballing, 47 relevant primary studies remained. These were categorized into 22 tool development papers and 25 tool adoption papers.
Synthesis: Data was extracted from the 22 tool development papers regarding the tool's name, detected smells, supported languages/frameworks, detection techniques, availability, correctness metrics, and adoption in later studies.

Key Findings (RQ1):

Tool Catalog: 22 distinct, peer-reviewed test smell detection tools were identified, published between 2006 (TRex) and 2020. There was a noticeable increase in both tool development and adoption publications in 2019 and 2020.
Detected Smells: The tools collectively detect 66 unique types of test smells (definitions provided in the paper). TestLint detects the most (26), followed by JNose Test (21) and tsDetect (19).
Smell Overlap: Many tools detect overlapping sets of smells. The most commonly detected smells across tools are General Fixture (9 tools), Eager Test (7 tools), and Assertion Roulette (6 tools). Some smells have similar definitions but different names (e.g., Assertionless, Assertionless Test, Unknown Test).
Supported Languages: Java is the most supported language (39 smell types), predominantly targeting the JUnit framework. Other supported languages include Smalltalk (28 types, SUnit), C++ (12 types, CppUnit/QTest), and Scala (6 types, ScalaTest). There's a notable lack of tools for languages like Python or JavaScript.

Key Findings (RQ2):

Tool Characteristics:
- Implementation/Target: Most tools (~86%) are implemented in and/or analyze Java code, focusing on JUnit.
- Correctness: Only 6 out of 22 tools report detection accuracy (precision, recall, or F-measure).
- Refactoring: Only 5 tools (e.g., DARTS, RAIDE, RTj, TestHound, TRex) offer some form of refactoring support for detected smells.
- Interface: Tools are available as command-line utilities, IDE plugins (Eclipse, IntelliJ, Pharo), or standalone desktop/web applications.
- Availability: 17 tools had accessible websites or source code repositories. tsDetect had the most forks (21).
- Documentation: Usage guides were available for 16 tools.
- Adoption: Most tools have low adoption in subsequent research; tsDetect and the unnamed tool by Bavota et al. [bavota2012empirical] were the most frequently reused.
Detection Techniques: Four main strategies were identified:
- Metrics-based: Using code metrics and thresholds (e.g., TestQ, TestHound).
- Rules/Heuristic-based: Combining metrics with specific code patterns (most common, e.g., tsDetect, Bavota's tool, JNose Test).
- Information Retrieval: Using text processing (stemming, TF-IDF) and ML on code identifiers/comments (e.g., Taste, DARTS, TEDD).
- Dynamic Tainting: Runtime analysis monitoring data flow, often used for dependency or state-related smells (e.g., OraclePolish, PolDet, DTDetector, Pradet).

Discussion & Takeaways:

Standardization Needed: Fragmentation exists in smell names and definitions across tools.
Language Support: More tools are needed for non-Java languages (especially Python, JavaScript) and diverse testing frameworks.
Reuse vs. Reinvention: Researchers should consider extending existing tools rather than building new ones from scratch; tool design should facilitate customization.
Transparency: Better reporting of tool correctness (precision/recall per smell) is needed, potentially alongside community benchmarks.
Beyond Detection: More focus is needed on robust, validated refactoring support within tools.

Conclusion:

The paper provides a valuable catalog and comparative analysis of 22 test smell detection tools. It highlights the prevalence of Java/JUnit support, the variety of detection techniques, and significant gaps in reporting correctness, supporting other languages, and providing refactoring capabilities. The findings serve as a resource for practitioners choosing a tool and guide future research toward improving tool quality, scope, and standardization.

PDF Markdown

Test Smell Detection Tools: A Systematic Mapping Study (2104.14640v2)

Summary

Related Papers