Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Need of Preserving Order of Data When Validating Within-Project Defect Classifiers (1809.01510v2)

Published 5 Sep 2018 in cs.SE

Abstract: [Context] The use of defect prediction models, such as classifiers, can support testing resource allocations by using data of the previous releases of the same project for predicting which software components are likely to be defective. A validation technique, hereinafter technique defines a specific way to split available data in training and test sets to measure a classifier accuracy. Time-series techniques have the unique ability to preserve the temporal order of data; i.e., preventing the testing set to have data antecedent to the training set. [Aim] The aim of this paper is twofold: first we check if there is a difference in the classifiers accuracy measured by time-series versus non-time-series techniques. Afterward, we check for a possible reason for this difference, i.e., if defect rates change across releases of a project. [Method] Our method consists of measuring the accuracy, i.e., AUC, of 10 classifiers on 13 open and two closed projects by using three validation techniques, namely cross validation, bootstrap, and walk-forward, where only the latter is a time-series technique. [Results] We find that the AUC of the same classifier used on the same project and measured by 10-fold varies compared to when measured by walk-forward in the range [-0.20, 0.22], and it is statistically different in 45% of the cases. Similarly, the AUC measured by bootstrap varies compared to when measured by walk-forward in the range [-0.17, 0.43], and it is statistically different in 56% of the cases. [Conclusions] We recommend choosing the technique to be used by carefully considering the conclusions to draw, the property of the available datasets, and the level of realism with the classifier usage scenario.

Citations (13)

Summary

We haven't generated a summary for this paper yet.