Is Hyper-Parameter Optimization Different for Software Analytics? (2401.09622v3)

Published 17 Jan 2024 in cs.SE and cs.LG

Abstract: Yes. SE data can have "smoother" boundaries between classes (compared to traditional AI data sets). To be more precise, the magnitude of the second derivative of the loss function found in SE data is typically much smaller. A new hyper-parameter optimizer, called SMOOTHIE, can exploit this idiosyncrasy of SE data. We compare SMOOTHIE and a state-of-the-art AI hyper-parameter optimizer on three tasks: (a) GitHub issue lifetime prediction (b) detecting static code warnings false alarm; (c) defect prediction. For completeness, we also show experiments on some standard AI datasets. SMOOTHIE runs faster and predicts better on the SE data--but ties on non-SE data with the AI tool. Hence we conclude that SE data can be different to other kinds of data; and those differences mean that we should use different kinds of algorithms for our data. To support open science and other researchers working in this area, all our scripts and datasets are available on-line at https://github.com/yrahul3910/smoothness-hpo/.

References (54)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

Tweets

https://twitter.com/ComputerPapers/status/1748188089502085292

https://twitter.com/ComputerPapers/status/1818571942334501192

https://twitter.com/ComputerPapers/status/1861379397246198061

Is Hyper-Parameter Optimization Different for Software Analytics? (2401.09622v3)

Summary

Related Papers

GitHub

Tweets