Papers
Topics
Authors
Recent
Search
2000 character limit reached

Bug Taxonomy: Classifying Software Defects

Updated 18 March 2026
  • Bug taxonomy is a structured classification scheme that organizes software defects by attributes such as origin, manifestation, severity, and domain specificity.
  • It supports empirical analysis and bug triage automation by leveraging data from version control, issue trackers, and developer feedback to streamline root-cause identification.
  • Domain-specific refinements, like those in quantum software and Jupyter Notebooks, guide targeted testing strategies and improve rapid defect resolution in diverse ecosystems.

A bug taxonomy is a structured classification scheme for categorizing software defects based on their origin, manifestation, type, severity, impact, or domain specificity. Rigorous bug taxonomies are central to empirical software engineering, bug triage automation, root-cause analysis, and the creation of diagnostic and quality assurance tools. Modern research develops multidimensional taxonomies grounded in empirical data from version control, issue trackers, patch notes, and domain-specific developer feedback, recognizing that effective taxonomies must adapt to the context—generic software, domain applications (video games, quantum software, computational notebooks), or ecosystem-specific workflows (e.g., NPM packages).

1. Multidimensional Taxonomy Structures

Contemporary bug taxonomies are increasingly multidimensional, combining orthogonal categorization axes to capture defect diversity with granularity and precision.

  • Origin-Based Taxonomies: The InEx-Bug taxonomy for the NPM ecosystem formalizes a distinction between Intrinsic (defect in local codebase), Extrinsic (breakage induced by changed dependencies or environmental drift), Not-a-Bug, and Unknown (Wright et al., 13 Feb 2026).
  • Domain-Specific Taxonomies: In quantum computing, a five-axis taxonomy distinguishes bug type (quantum/classical/uncategorized), coarse category (16+ classes), severity (Critical/High/Medium/Low), impacted quality attributes (e.g., Usability, Reliability per ISO/IEC 25010), and quantum-specific subtypes (circuit issues, gate errors, etc.) (Yousuf et al., 12 Jun 2025).
  • Root Cause/Manifestation Frameworks: Catolino et al. provide a root-cause taxonomy for general-purpose OSS with nine categories: configuration, network, database, GUI, performance, permission/deprecation, security, program anomaly, and test code (Catolino et al., 2019).
  • Application/Phenomenology-Oriented: Jupyter Notebooks exhibit bugs in eight high-level categories—kernel, conversion, portability, environment/settings, connection, processing, cell defect, and implementation, each with subtypes tied to interactive notebook workflows (Santana et al., 2022).
  • Behavior/Subsystem Taxonomies (Game Software): Game update notes yield a 20-category taxonomy, spanning action, AI, audio, camera, collisions, crash, exploit, UI, value, and more, enabling recurrence and severity analysis in shipped game patches (Truelove et al., 2021).

This multidimensional approach facilitates automated classification and supports broad-spectrum defect analytics across heterogeneous codebases and ecosystems.

2. Canonical Bug Categories Across Domains

While terminology and granularity vary, core category archetypes recur in empirical taxonomies. The following table summarizes frequent top-level classes, with mapping across select studies:

General (Catolino et al.) NPM (InEx-Bug) Quantum Software Jupyter Notebooks Video Games
Configuration Intrinsic/Extrinsic Compatibility, Syntax Environments & Settings (ES) Implementation Resp.
Performance Intrinsic Performance Processing (PC) Performance
Security Intrinsic Security Environment/Extension (ES) Security
GUI Intrinsic Usability Kernel, Cell Defect (CD) UI
Program/Functional Intrinsic Functional, Logical Implementation (IP) Action, AI, Bounds
Test Code Not-a-Bug Test Coverage Not explicitly classed Not in taxonomy
Network Extrinsic Communication Connection (CN) Not in taxonomy
Database Intrinsic Database Connection/Processing Not in taxonomy

Domain-specific categories appear as needed: Quantum Circuit Issues (Yousuf et al., 12 Jun 2025), Kernel Bugs (Santana et al., 2022), Bounds, Triggered Event (Truelove et al., 2021). Precise instantiations are governed by context and targeted user workflows.

3. Methodologies for Taxonomy Construction and Validation

Empirical taxonomies are developed through manual annotation, open/axial coding, triangulation with developer feedback, and automated classifier benchmarking:

Taxonomies are iteratively refined as evolving codebases introduce new failure modes and as empirical inter-rater agreement guides merging or splitting of classes.

4. Quantitative Patterns and Comparative Empirical Findings

Quantitative analyses of taxonomy categories expose defect prevalence, severity, recurrence, and ecosystem fragilities:

  • Distribution and Growth: Implementation bugs dominate Jupyter commit logs (44.2% in GH; 22% in SO), while Environments/Settings dominate SO posts (43.2%) (Santana et al., 2022). In quantum software, classical bugs comprise 67.2%, with quantum-specific at 27.3% (Yousuf et al., 12 Jun 2025).
  • Resolution Dynamics: In NPM, Intrinsic bugs resolve faster (median 8.9 days) and more frequently require code modification (56.7% vs. 28.1%) than Extrinsic (median 10.2 days; Mann–Whitney UU, p<0.05p < 0.05) (Wright et al., 13 Feb 2026).
  • Severity and Recurrence: Game taxonomy identifies Crash bugs as both among the most severe (severity 38%) and the most recurrent, while Camera/Event Occurrence/Exploit are least severe (<10%) (Truelove et al., 2021). Jupyter kernel bugs, while infrequent in commits (2.9%), have high end-user impact (universal user frustration) (Santana et al., 2022).
  • Root Causes and Impacts: Installation/configuration and version mismatches are predominant in Jupyter (Install & Config 32.1% SO/16.3% GH, Version 19.0% SO/22.5% GH) (Santana et al., 2022). Functional anomalies are the most frequent in open-source root-cause taxonomy (41%) (Catolino et al., 2019).

Severity metrics in quantum software show 93.7% of issues as Low, only 4.3% as Critical (Yousuf et al., 12 Jun 2025). Recurrence analysis in games relies on cosine-similarity clustering and manual vetting for true repeat bug types (Truelove et al., 2021).

5. Classification Criteria, Key Metrics, and Automation

Formalized criteria, metrics, and automation pipelines underpin objective bug triage and analysis:

  • Classification Criteria: Issue text, code change evidence (e.g., PR within 7 days for InEx-Bug (Wright et al., 13 Feb 2026)), dependency/environment context, root-cause signals from stack traces and reproduction narratives.
  • Temporal and Behavioral Metrics:
    • Median Resolution Time: Mres(C)=medianiCclosed((ticlosetiopen)/1day)M_{\text{res}}(C) = \text{median}_{i \in C_{\text{closed}}}\big((t_i^{\text{close}} − t_i^{\text{open}}) / 1\,\text{day} \big).
    • Reopen Rate: Rreopen(C)={iC:i reopened}CclosedR_{\text{reopen}}(C) = \frac{| \{ i \in C : i \text{ reopened} \} |}{|C_{\text{closed}}|}.
    • Recurrence Delay: Drec(C)=mediani reopened in C((tireopenticlose)/1day)D_{\text{rec}}(C) = \text{median}_{i \text{ reopened in } C} \left( (t_i^{\text{reopen}} − t_i^{\text{close}}) / 1\,\text{day} \right ) (Wright et al., 13 Feb 2026).
  • Automated Tools: Rule-based NLP frameworks (quantum), logistic regression, SVM, or random forest classifiers informed by TF-IDF features attain high agreement and support for bug-type, category, and quality-attribute inference; severity assignment remains more challenging (quantum Cohen’s kappa 0.162 for severity vs. 0.712–0.826 for other axes) (Yousuf et al., 12 Jun 2025).

Process automation is recently extended to multidimensional label assignment and used for real-time triage and health monitoring in large project and ecosystem contexts.

6. Domain-Specific Implications and Best Practices

Taxonomies inform both empirical research and practical tooling:

  • Bug Triage and Assignment: Structured, automated labels (root-cause, origin) expedite assignment to domain specialists (e.g., route Extrinsic bugs upstream, fast-track security issues) (Wright et al., 13 Feb 2026, Catolino et al., 2019).
  • Testing Strategies: Analysis of recurrence and severity supports prioritization (emphasize Crash/Object Persistence in games, kernel/processing bugs in Jupyter, quantum circuit issues in Qiskit) (Truelove et al., 2021, Yousuf et al., 12 Jun 2025, Santana et al., 2022).
  • Ecosystem Health and Maintenance: Rising Extrinsic bug rate signals dependency/API fragility in NPM; nuanced metrics guide integration-test deployment post-upgrades (Wright et al., 13 Feb 2026). In Jupyter, the prevalence of ES and Implementation bugs motivates built-in environment checkers, version management, and advanced linting (Santana et al., 2022).
  • Code Quality and Process Improvements: Recurrent and severe bugs in games arise from insufficient multi-component/interaction testing, complex data dependencies, and weak telemetry (Truelove et al., 2021). Improved architectural modularity and standardized issue templates improve root-cause observability and reduce Not-a-Bug noise (Wright et al., 13 Feb 2026).
  • Research Tooling Needs: Integration of root-cause prediction into triage platforms, domain-specific static analysis (configuration linters, quantum circuit verifiers), and visual diffs for literate notebook environments are prioritized advancements (Santana et al., 2022, Yousuf et al., 12 Jun 2025, Catolino et al., 2019).

7. Comparative Limitations and Future Extensions

Limitations in current taxonomies motivate ongoing research:

  • Coverage and Generalizability: Some taxonomies derive from open-source, public projects and may not extend directly to closed-source or specialized industrial contexts (Catolino et al., 2019).
  • Feature Scope: Automated classifiers limited to text summaries underperform on subtle categories; inclusion of comments, tracebacks, patches, and code metrics is advocated (Catolino et al., 2019, Yousuf et al., 12 Jun 2025).
  • Taxonomic Refinement: High-volume categories often mask distinct error modes (e.g., “Program Anomaly” or “Implementation”), suggesting need for finer-grained decomposition over time (Catolino et al., 2019, Santana et al., 2022).
  • Domain Evolution: Emerging technologies (quantum software, notebooks-in-production) and shifting development paradigms (microservices, multiparadigm environments) necessitate regular taxonomy revision.

Continued triangulation among empirical mining, user studies, and automated inferential tools is fundamental to sustaining the relevance and efficacy of bug taxonomies for software engineering research and practice.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bug Taxonomy.