Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Effective Bug Triage with Towards Effective Bug Triage with Software Data Reduction Techniques (1704.04761v1)

Published 16 Apr 2017 in cs.SE

Abstract: Software companies spend over 45 percent of cost in dealing with software bugs. An inevitable step of fixing bugs is bug triage, which aims to correctly assign a developer to a new bug. To decrease the time cost in manual work, text classification techniques are applied to conduct automatic bug triage. In this paper, we address the problem of data reduction for bug triage, i.e., how to reduce the scale and improve the quality of bug data. We combine instance selection with feature selection to simultaneously reduce data scale on the bug dimension and the word dimension. To determine the order of applying instance selection and feature selection, we extract attributes from historical bug data sets and build a predictive model for a new bug data set. We empirically investigate the performance of data reduction on totally 600,000 bug reports of two large open source projects, namely Eclipse and Mozilla. The results show that our data reduction can effectively reduce the data scale and improve the accuracy of bug triage. Our work provides an approach to leveraging techniques on data processing to form reduced and high-quality bug data in software development and maintenance.

Citations (166)

Summary

  • The paper demonstrates that applying software data reduction techniques notably improves the accuracy of automated bug triage.
  • Empirical tests on large datasets show data reduction enhances triage accuracy by 1% to 12% when using Naive Bayes.
  • The study also presents a predictive model to optimize the order of reduction techniques for further accuracy gains.

Analyzing Data Reduction Techniques for Enhanced Bug Triage

The complexities and costs associated with managing software bugs have long been a concern for software companies, particularly when performing bug triage—a crucial step in identifying the appropriate developer to resolve a newly reported bug. Recognizing that software companies allocate over 45% of their expenditures to managing bugs, the paper authored by Xuan et al. explores the application of data reduction techniques aimed at optimizing bug triage. This paper presents a dual approach that merges instance selection and feature selection, seeking to both diminish the data size and enhance its quality effectively.

Advances in Bug Triage Efficiency

The paper hinges on a transition from manual bug triage to an automated approach utilizing text classification methods. These methods convert the bug triage challenge into a classification problem, employing a variety of techniques to automatically assign developers to bug reports. Despite existing automated approaches, the overwhelming scale and subpar quality of bug data impede their efficacy, necessitating reduced and refined bug datasets for improved triage accuracy.

Empirical Analysis

Through empirical assessment, the authors meticulously examine the effectiveness of data reduction on a set of 600,000 bug reports from Eclipse and Mozilla projects. The results indicate a successful enhancement in the accuracy of bug triage when integrating data reduction methods. Specifically, a notable improvement in accuracy—ranging from 1% to 12%—is observed when treatment involves removing 50% of bug reports and 70% of associated words using Naive Bayes for classification.

Predictive Modeling for Reduction Order

A unique aspect of this research is the development of a predictive model to determine the optimal order of applying instance and feature selection. The authors highlight that the sequence of employing these reduction techniques directly influences triage accuracy, offering insights into methodologies for enhancing bug repository management. With experimental validation yielding a prediction accuracy of 71.8%, it underscores the significance of historical data attributes—none of which singularly dictate the reduction order, yet collectively contribute to predictive modeling.

Practical Implications and Future Work

The insights garnered from this paper bear practical implications for software maintenance and highlight a path toward crafting efficient bug triage systems. There remains scope for future exploration, particularly in embedding detailed attributes within predictive models and refining reduction orders for software-specific tasks. The potential to enhance the quality of bug reports and simultaneously curtail labor costs presents promising avenues for further research and application in AI-driven software management.

The paper by Xuan et al. importantly contributes to the domain of software debugging by aligning data mining techniques with practical approaches to manage software repositories, particularly in large-scale projects. By effectively utilizing data reduction methodologies, this paper lays the groundwork for optimizing bug triage processes and fostering advancements in AI applications for software engineering.