Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evaluating Search-Based Software Microbenchmark Prioritization (2211.13525v4)

Published 24 Nov 2022 in cs.SE

Abstract: Ensuring that software performance does not degrade after a code change is paramount. A solution is to regularly execute software microbenchmarks, a performance testing technique similar to (functional) unit tests, which, however, often becomes infeasible due to extensive runtimes. To address that challenge, research has investigated regression testing techniques, such as test case prioritization (TCP), which reorder the execution within a microbenchmark suite to detect larger performance changes sooner. Such techniques are either designed for unit tests and perform sub-par on microbenchmarks or require complex performance models, drastically reducing their potential application. In this paper, we empirically evaluate single- and multi-objective search-based microbenchmark prioritization techniques to understand whether they are more effective and efficient than greedy, coverage-based techniques. For this, we devise three search objectives, i.e., coverage to maximize, coverage overlap to minimize, and historical performance change detection to maximize. We find that search algorithms (SAs) are only competitive with but do not outperform the best greedy, coverage-based baselines. However, a simple greedy technique utilizing solely the performance change history (without coverage information) is equally or more effective than the best coverage-based techniques while being considerably more efficient, with a runtime overhead of less than 1%. These results show that simple, non-coverage-based techniques are a better fit for microbenchmarks than complex coverage-based techniques.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (85)
  1. P. Achimugu, A. Selamat, R. Ibrahim, and M. N. Mahrin, “A systematic literature review of software requirements prioritization research,” Information and Software Technology, vol. 56, no. 6, 2014. [Online]. Available: https://doi.org/10.1016/j.infsof.2014.02.001
  2. N. Alshahwan, M. Harman, and A. Marginean, “Software testing research challenges: An industrial perspective,” in Proceedings of the 16th IEEE International Conference on Software Testing, Verification and Validation, ser. ICST 2023.   IEEE, 2023. [Online]. Available: https://doi.org/10.1109/ICST57152.2023.00008
  3. D. Alshoaibi, K. Hannigan, H. Gupta, and M. W. Mkaouer, “PRICE: Detection of performance regression introducing code changes using static and dynamic metrics,” in Proceedings of the 11th International Symposium on Search Based Software Engineering, ser. SSBSE 2019.   Springer Nature, 2019. [Online]. Available: https://doi.org/10.1007/978-3-030-27455-9_6
  4. D. Alshoaibi, M. W. Mkaouer, A. Ouni, A. Wahaishi, T. Desell, and M. Soui, “Search-based detection of code changes introducing performance regression,” Swarm and Evolutionary Computation, vol. 73, 2022. [Online]. Available: https://doi.org/10.1016/j.swevo.2022.101101
  5. A. Arcuri, “RESTful API automated test case generation with EvoMaster,” ACM Transactions on Software Engineering and Methodology, vol. 28, no. 1, 2019. [Online]. Available: https://doi.org/10.1145/3293455
  6. A. Arcuri and L. Briand, “A practical guide for using statistical tests to assess randomized algorithms in software engineering,” in Proceedings of the 33rd International Conference on Software Engineering, ser. ICSE 2011.   ACM, 2011. [Online]. Available: https://doi.org/10.1145/1985793.1985795
  7. A. Auger, J. Bader, D. Brockhoff, and E. Zitzler, “Theory of the hypervolume indicator: Optimal µ-distributions and the choice of the reference point,” in Proceedings of the 10th ACM SIGEVO Workshop on Foundations of Genetic Algorithms, ser. FOGA 2009.   ACM, 2009. [Online]. Available: https://doi.org/10.1145/1527125.1527138
  8. Y. Benjamini and D. Yekutieli, “The control of the false discovery rate in multiple testing under dependency,” The Annals of Statistics, vol. 29, no. 4, 2001. [Online]. Available: https://doi.org/10.1214/aos/1013699998
  9. J. Branke, K. Deb, H. Dierolf, and M. Osswald, “Finding knees in multi-objective optimization,” in Proceedings of the 8th International Conference on Parallel Problem Solving from Nature, ser. PPSN 2004.   Springer, 2004.
  10. J. Chen and W. Shang, “An exploratory study of performance regression introducing code changes,” in Proceedings of the 33rd IEEE International Conference on Software Maintenance and Evolution, ser. ISCME 2017.   IEEE, 2017. [Online]. Available: https://doi.org/10.1109/icsme.2017.13
  11. J. Chen, W. Shang, and E. Shihab, “PerfJIT: Test-level just-in-time prediction for performance regression introducing commits,” IEEE Transactions on Software Engineering, 2020. [Online]. Available: https://doi.org/10.1109%2Ftse.2020.3023955
  12. K. Chen, Y. Li, Y. Chen, C. Fan, Z. Hu, and W. Yang, “GLIB: Towards automated test oracle for graphically-rich applications,” in Proceedings of the 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2021.   ACM, 2021. [Online]. Available: https://doi.org/10.1145/3468264.3468586
  13. D. Daly, “Creating a virtuous cycle in performance testing at MongoDB,” in Proceedings of the 12th ACM/SPEC International Conference on Performance Engineering, ser. ICPE 2021.   ACM, 2021. [Online]. Available: https://doi.org/10.1145/3427921.3450234
  14. D. Daly, W. Brown, H. Ingo, J. O'Leary, and D. Bradford, “The use of change point detection to identify software performance regressions in a continuous integration system,” in Proceedings of the 11th ACM/SPEC International Conference on Performance Engineering, ser. ICPE 2020.   ACM, 2020. [Online]. Available: https://doi.org/10.1145/3358960.3375791
  15. D. E. Damasceno Costa, C.-P. Bezemer, P. Leitner, and A. Andrzejak, “What’s wrong with my benchmark results? Studying bad practices in JMH benchmarks,” IEEE Transactions on Software Engineering, 2019.
  16. A. C. Davison and D. Hinkley, “Bootstrap methods and their application,” Journal of the American Statistical Association, vol. 94, 1997.
  17. A. B. de Oliveira, S. Fischmeister, A. Diwan, M. Hauswirth, and P. F. Sweeney, “Perphecy: Performance regression test selection made simple but effective,” in Proceedings of the 10th IEEE International Conference on Software Testing, Verification and Validation, ser. ICST 2017, 2017.
  18. K. Deb and H. Jain, “An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: Solving problems with box constraints,” IEEE Transactions on Evolutionary Computation, vol. 18, no. 4, 2014. [Online]. Available: https://doi.org/10.1109%2Ftevc.2013.2281535
  19. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, 2002. [Online]. Available: https://doi.org/10.1109%2F4235.996017
  20. P. Delgado-Pérez, A. B. Sánchez, S. Segura, and I. Medina-Bulo, “Performance mutation testing,” Software Testing, Verification and Reliability, vol. 31, no. 5, 2020. [Online]. Available: https://doi.org/10.1002/stvr.1728
  21. X. Devroey, A. Gambi, J. P. Galeotti, R. Just, F. M. Kifetew, A. Panichella, and S. Panichella, “JUGE: An infrastructure for benchmarking Java unit test generators,” Software Testing, Verification and Reliability, vol. 33, no. 3, 2023. [Online]. Available: https://doi.org/10.1002/stvr.1838
  22. D. Di Nucci, A. Panichella, A. Zaidman, and A. De Lucia, “A test case prioritization genetic algorithm guided by the hypervolume indicator,” IEEE Transactions on Software Engineering, vol. 46, no. 6, 2020. [Online]. Available: https://doi.org/10.1109/tse.2018.2868082
  23. O. J. Dunn, “Multiple comparisons using rank sums,” Technometrics, vol. 6, no. 3, 1964. [Online]. Available: https://doi.org/10.1080/00401706.1964.10490181
  24. S. Elbaum, A. Malishevsky, and G. Rothermel, “Incorporating varying test costs and fault severities into test case prioritization,” in Proceedings of the 23rd International Conference on Software Engineering, ser. ICSE 2001.   IEEE, 2001. [Online]. Available: https://doi.org/10.1109/icse.2001.919106
  25. S. Elbaum, G. Rothermel, and J. Penix, “Techniques for improving regression testing in continuous integration development environments,” in Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2014.   ACM, 2014. [Online]. Available: http://doi.acm.org/10.1145/2635868.2635910
  26. D. Elsner, F. Hauer, A. Pretschner, and S. Reimer, “Empirically evaluating readily available information for regression test optimization in continuous integration,” in Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2021.   ACM, 2021. [Online]. Available: https://doi.org/10.1145/3460319.3464834
  27. M. G. Epitropakis, S. Yoo, M. Harman, and E. K. Burke, “Empirical evaluation of pareto efficient multi-objective regression test case prioritisation,” in Proceedings of the 2015 International Symposium on Software Testing and Analysis, ser. ISSTA 2015.   ACM, 2015. [Online]. Available: https://doi.org/10.1145/2771783.2771788
  28. C. M. Fonseca and P. J. Fleming, “An overview of evolutionary algorithms in multiobjective optimization,” Evolutionary Computation, vol. 3, no. 1, 1995. [Online]. Available: https://doi.org/10.1162/evco.1995.3.1.1
  29. G. Fraser and A. Zeller, “Mutation-driven generation of unit tests and oracles,” in Proceedings of the 19th International Symposium on Software Testing and Analysis, ser. ISSTA 2010.   ACM, 2010. [Online]. Available: https://doi.org/10.1145/1831708.1831728
  30. A. Georges, D. Buytaert, and L. Eeckhout, “Statistically rigorous Java performance evaluation,” in Proceedings of the 22nd ACM SIGPLAN Conference on Object-Oriented Programming, Systems, and Applications, ser. OOPSLA 2007.   ACM, 2007. [Online]. Available: http://doi.acm.org/10.1145/1297027.1297033
  31. M. Grambow, C. Laaber, P. Leitner, and D. Bermbach, “Using application benchmark call graphs to quantify and improve the practical relevance of microbenchmark suites,” PeerJ Computer Science, vol. 7, 2021. [Online]. Available: https://doi.org/10.7717/peerj-cs.548
  32. M. Grambow, D. Kovalev, C. Laaber, P. Leitner, and D. Bermbach, “Using microbenchmark suites to detect application performance changes,” IEEE Transactions on Cloud Computing, 2022.
  33. A. Haghighatkhah, M. Mäntylä, M. Oivo, and P. Kuvaja, “Test prioritization in continuous integration environments,” The Journal of Systems and Software, vol. 146, 2018. [Online]. Available: https://doi.org/10.1016/j.jss.2018.08.061
  34. D. Hao, L. Zhang, L. Zhang, G. Rothermel, and H. Mei, “A unified test case prioritization approach,” ACM Transactions on Software Engineering and Methodology, vol. 24, no. 2, 2014. [Online]. Available: http://doi.acm.org/10.1145/2685614
  35. M. Harman, “The current state and future of search based software engineering,” in Future of Software Engineering, ser. FOSE 2007.   IEEE, 2007. [Online]. Available: https://doi.org/10.1109/fose.2007.29
  36. M. Harman and B. F. Jones, “Search-based software engineering,” Information and Software Technology, vol. 43, no. 14, 2001. [Online]. Available: https://doi.org/10.1016/S0950-5849(01)00189-6
  37. S. He, G. Manns, J. Saunders, W. Wang, L. Pollock, and M. L. Soffa, “A statistics-based performance testing methodology for cloud applications,” in Proceedings of the 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2019.   ACM, 2019. [Online]. Available: http://doi.acm.org/10.1145/3338906.3338912
  38. M. R. Hess and J. D. Kromrey, “Robust confidence intervals for effect sizes: A comparative study of cohen’s d and cliff’s delta under non-normality and heterogeneous variances,” Annual Meeting of the American Educational Research Association, 2004.
  39. T. C. Hesterberg, “What teachers should know about the bootstrap: Resampling in the undergraduate statistics curriculum,” The American Statistician, vol. 69, no. 4, 2015. [Online]. Available: https://doi.org/10.1080/00031305.2015.1089789
  40. P. Huang, X. Ma, D. Shen, and Y. Zhou, “Performance regression testing target prioritization via performance risk analysis,” in Proceedings of the 36th IEEE/ACM International Conference on Software Engineering, ser. ICSE 2014.   ACM, 2014. [Online]. Available: http://doi.acm.org/10.1145/2568225.2568232
  41. M. M. Islam, A. Marchetto, A. Susi, and G. Scanniello, “A multi-objective technique to prioritize test cases based on latent semantic indexing,” in Proceedings of the 16th European Conference on Software Maintenance and Reengineering, ser. CSMR 2012.   IEEE, 2012. [Online]. Available: https://doi.org/10.1109/csmr.2012.13
  42. M. Jangali, Y. Tang, N. Alexandersson, P. Leitner, J. Yang, and W. Shang, “Automated generation and evaluation of JMH microbenchmark suites from unit tests,” IEEE Transactions on Software Engineering, 2022. [Online]. Available: https://doi.org/10.1109/TSE.2022.3188005
  43. Z. M. Jiang and A. E. Hassan, “A survey on load testing of large-scale software systems,” IEEE Transactions on Software Engineering, vol. 41, no. 11, 2015. [Online]. Available: https://doi.org/10.1109/tse.2015.2445340
  44. R. Just, D. Jalali, and M. D. Ernst, “Defects4J: A database of existing faults to enable controlled testing studies for Java programs,” in Proceedings of the 2014 International Symposium on Software Testing and Analysis, ser. ISSTA 2014.   ACM, 2014. [Online]. Available: https://doi.org/10.1145/2610384.2628055
  45. T. Kalibera and R. Jones, “Quantifying performance changes with effect size confidence intervals,” University of Kent, Technical Report 4–12, 2012. [Online]. Available: http://www.cs.kent.ac.uk/pubs/2012/3233
  46. ——, “Rigorous benchmarking in reasonable time,” in Proceedings of the 2013 ACM SIGPLAN International Symposium on Memory Management, ser. ISMM 2013.   ACM, 2013. [Online]. Available: http://doi.acm.org/10.1145/2464157.2464160
  47. M. Kim, T. Hiroyasu, M. Miki, and S. Watanabe, “SPEA2+: Improving the performance of the strength pareto evolutionary algorithm 2,” in Proceedings of the 8th International Conference on Parallel Problem Solving from Nature, ser. PPSN 2004, vol. 3242.   Springer, 2004. [Online]. Available: https://doi.org/10.1007%2F978-3-540-30217-9_75
  48. J. D. Knowles and D. Corne, “The Pareto aarchived evolution strategy: A new baseline algorithm for Pareto Multiobjective optimisation,” in Proceedings of the Congress on Evolutionary Computation, ser. CEC 1999.   IEEE, 1999. [Online]. Available: https://doi.org/10.1109/cec.1999.781913
  49. W. H. Kruskal and W. A. Wallis, “Use of ranks in one-criterion variance analysis,” Journal of the American Statistical Association, vol. 47, no. 260, 1952. [Online]. Available: https://doi.org/10.1080/01621459.1952.10483441
  50. C. Laaber, “chrstphlbr/pa: v0.1.0,” Nov. 2022. [Online]. Available: https://doi.org/10.5281/zenodo.7308066
  51. C. Laaber and P. Leitner, “An evaluation of open-source software microbenchmark suites for continuous performance assessment,” in Proceedings of the 15th International Conference on Mining Software Repositories, ser. MSR 2018.   ACM, 2018. [Online]. Available: http://doi.acm.org/10.1145/3196398.3196407
  52. C. Laaber and S. Würsten, “chrstphlbr/bencher: Release v0.4.0,” Jan. 2024. [Online]. Available: https://doi.org/10.5281/zenodo.10527360
  53. C. Laaber, J. Scheuner, and P. Leitner, “Software microbenchmarking in the cloud. How bad is it really?” Empirical Software Engineering, vol. 24, 2019. [Online]. Available: https://doi.org/10.1007/s10664-019-09681-1
  54. C. Laaber, S. Würsten, H. C. Gall, and P. Leitner, “Dynamically reconfiguring software microbenchmarks: Reducing execution time without sacrificing result quality,” in Proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2020.   ACM, 2020. [Online]. Available: https://doi.org/10.1145/3368089.3409683
  55. C. Laaber, M. Basmaci, and P. Salza, “Predicting unstable software benchmarks using static source code features,” Empirical Software Engineering, vol. 26, no. 6, 2021. [Online]. Available: https://doi.org/10.1007/s10664-021-09996-y
  56. C. Laaber, H. C. Gall, and P. Leitner, “Applying test case prioritization to software microbenchmarks,” Empirical Software Engineering, vol. 26, no. 6, 2021. [Online]. Available: https://doi.org/10.1007/s10664-021-10037-x
  57. ——, “Replication package "Applying test case prioritization to software microbenchmarks",” 2021. [Online]. Available: https://doi.org/10.5281/zenodo.5206117
  58. C. Laaber, T. Yue, and S. Ali, “Replication package "Evaluating search-based software microbenchmark prioritization",” 2024. [Online]. Available: https://doi.org/10.5281/zenodo.10527125
  59. P. Leitner and C.-P. Bezemer, “An exploratory study of the state of practice of performance testing in Java-based open source projects,” in Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, ser. ICPE 2017.   ACM, 2017. [Online]. Available: http://doi.acm.org/10.1145/3030207.3030213
  60. Z. Li, M. Harman, and R. M. Hierons, “Search algorithms for regression test case prioritization,” IEEE Transactions on Software Engineering, vol. 33, no. 4, 2007. [Online]. Available: https://doi.org/10.1109/TSE.2007.38
  61. Z. Li, Y. Bian, R. Zhao, and J. Cheng, “A fine-grained parallel multi-objective test case prioritization on GPU,” in Proceedings of the 5th Symposium on Search Based Software Engineering, ser. SSBSE 2013.   Springer, 2013. [Online]. Available: https://doi.org/10.1007/978-3-642-39742-4_10
  62. J. Liang, S. Elbaum, and G. Rothermel, “Redefining prioritization: Continuous prioritization for continuous integration,” in Proceedings of the 40th IEEE/ACM International Conference on Software Engineering, ser. ICSE 2018.   ACM, 2018. [Online]. Available: http://doi.acm.org/10.1145/3180155.3180213
  63. Q. Luo, K. Moran, and D. Poshyvanyk, “A large-scale empirical comparison of static and dynamic test case prioritization techniques,” in Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2016.   ACM, 2016. [Online]. Available: http://doi.acm.org/10.1145/2950290.2950344
  64. Q. Luo, K. Moran, L. Zhang, and D. Poshyvanyk, “How do static and dynamic test case prioritization techniques perform on modern software systems? An extensive study on GitHub projects,” IEEE Transactions on Software Engineering, vol. 45, no. 11, 2019. [Online]. Available: https://doi.org/10.1109/tse.2018.2822270
  65. A. Marchetto, M. M. Islam, W. Asghar, A. Susi, and G. Scanniello, “A multi-objective technique to prioritize test cases,” IEEE Transactions on Software Engineering, vol. 42, no. 10, 2016. [Online]. Available: https://doi.org/10.1109/tse.2015.2510633
  66. A. Maricq, D. Duplyakin, I. Jimenez, C. Maltzahn, R. Stutsman, and R. Ricci, “Taming performance variability,” in Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation, ser. OSDI 2018.   USENIX Association, 2018. [Online]. Available: https://www.usenix.org/conference/osdi18/presentation/maricq
  67. S. Mostafa, X. Wang, and T. Xie, “PerfRanker: Prioritization of performance regression tests for collection-intensive software,” in Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2017.   ACM, 2017. [Online]. Available: http://doi.acm.org/10.1145/3092703.3092725
  68. A. J. Nebro, J. J. Durillo, F. Luna, B. Dorronsoro, and E. Alba, “MOCell: A cellular genetic algorithm for multiobjective optimization,” International Journal of Intelligent Systems, vol. 24, no. 7, 2009. [Online]. Available: https://doi.org/10.1002/int.20358
  69. M. Pradel, M. Huggler, and T. R. Gross, “Performance regression testing of concurrent classes,” in Proceedings of the 2014 International Symposium on Software Testing and Analysis, ser. ISSTA 2014.   ACM, 2014. [Online]. Available: http://doi.acm.org/10.1145/2610384.2610393
  70. S. Ren, H. Lai, W. Tong, M. Aminzadeh, X. Hou, and S. Lai, “Nonparametric bootstrapping for hierarchical data,” Journal of Applied Statistics, vol. 37, no. 9, 2010. [Online]. Available: https://doi.org/10.1080/02664760903046102
  71. G. Rothermel and M. J. Harrold, “Empirical studies of a safe regression test selection technique,” IEEE Transactions on Software Engineering, vol. 24, no. 6, 1998.
  72. G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold, “Test case prioritization: An empirical study,” in Proceedings of the IEEE International Conference on Software Maintenance, ser. ICSM 1999.   IEEE, 1999. [Online]. Available: https://doi.org/10.1109/icsm.1999.792604
  73. G. Rothermel, R. J. Untch, and C. Chu, “Prioritizing test cases for regression testing,” IEEE Transactions on Software Engineering, vol. 27, no. 10, 2001. [Online]. Available: https://doi.org/10.1109/32.962562
  74. H. Samoaa and P. Leitner, “An exploratory study of the impact of parameterization on JMH measurement results in open-source projects,” in Proceedings of the 12th ACM/SPEC International Conference on Performance Engineering, ser. ICPE 2021.   ACM, 2021. [Online]. Available: https://doi.org/10.1145/3427921.3450243
  75. P. Stefan, V. Horký, L. Bulej, and P. Tůma, “Unit testing performance in Java projects: Are we there yet?” in Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, ser. ICPE 2017.   ACM, 2017. [Online]. Available: http://doi.acm.org/10.1145/3030207.3030226
  76. K.-J. Stol and B. Fitzgerald, “The ABC of software engineering research,” ACM Transactions on Software Engineering and Methodology, vol. 27, no. 3, 2018. [Online]. Available: https://doi.org/10.1145/3241743
  77. L. Traini, D. Di Pompeo, M. Tucci, B. Lin, S. Scalabrino, G. Bavota, M. Lanza, R. Oliveto, and V. Cortellessa, “How software refactoring impacts execution time,” ACM Transactions on Software Engineering and Methodology, vol. 31, no. 2, 2022. [Online]. Available: https://doi.org/10.1145/3485136
  78. L. Traini, V. Cortellessa, D. Di Pompeo, and M. Tucci, “Towards effective assessment of steady state performance in Java software: Are we there yet?” Empirical Software Engineering, vol. 28, no. 13, 2023. [Online]. Available: https://doi.org/10.1007/s10664-022-10247-x
  79. A. Vargha and H. D. Delaney, “A critique and improvement of the "CL" common language effect size statistics of McGraw and Wong,” Journal of Educational and Behavioral Statistics, vol. 25, no. 2, 2000. [Online]. Available: https://doi.org/10.2307/1165329
  80. S. Yoo and M. Harman, “Regression testing minimization, selection and prioritization: A survey,” Software: Testing, Verification and Reliability, vol. 22, no. 2, 2012. [Online]. Available: http://dx.doi.org/10.1002/stv.430
  81. ——, “Pareto efficient multi-objective test case selection,” in Proceedings of the 2007 International Symposium on Software Testing and Analysis, ser. ISSTA 2007.   ACM, 2007. [Online]. Available: https://doi.org/10.1145/1273463.1273483
  82. T. Yu and M. Pradel, “Pinpointing and repairing performance bottlenecks in concurrent programs,” Empirical Software Engineering, vol. 23, no. 5, 2017. [Online]. Available: https://doi.org/10.1007/s10664-017-9578-1
  83. H. Zhang, M. Zhang, T. Yue, S. Ali, and Y. Li, “Uncertainty-wise requirements prioritization with search,” ACM Transactions on Software Engineering and Methodology, vol. 30, no. 1, 2020. [Online]. Available: https://doi.org/10.1145/3408301
  84. E. Zitzler and S. Künzli, “Indicator-based selection in multiobjective search,” in Proceedings of the 8th International Conference on Parallel Problem Solving from Nature, ser. PPSN 2004, vol. 3242.   Springer, 2004. [Online]. Available: https://doi.org/10.1007/978-3-540-30217-9_84
  85. E. Zitzler, M. Laumanns, and L. Thiele, “Spea2: Improving the strength pareto evolutionary algorithm,” ETH Zurich, Computer Engineering and Networks Laboratory, TIK Report 103, 2001. [Online]. Available: https://doi.org/10.3929/ETHZ-A-004284029
Citations (2)

Summary

  • The paper demonstrates that search-based genetic algorithms with combined coverage objectives perform competitively but do not statistically surpass the traditional greedy Total coverage baseline.
  • The study reveals that search-based techniques incur only marginal computational overhead, while greedy methods using historical performance change are notably efficient.
  • The paper recommends using non-coverage-based, greedy historical change methods as practical, low-overhead alternatives for effective microbenchmark prioritization.

Evaluating Search-Based Software Microbenchmark Prioritization

The paper under discussion explores the effectiveness and efficiency of Search-Based Software Microbenchmark Prioritization (SBSMBP) techniques, aimed at improving the detection of performance changes in software systems. Focusing on the empirical analysis of search-based and greedy techniques, the paper evaluates the role of various objectives in prioritizing microbenchmarks, highlighting the need for efficient methods to manage extensive runtimes inherent in performance testing.

Summary

The authors investigate SBSMBP as a solution to the difficulties posed by lengthy execution times of microbenchmarks, which are essential for ensuring software performance stability after code modifications. The paper compares novel search-based approaches with traditional greedy heuristics, notably those relying on code coverage as a proxy for fault detection. The studied techniques utilize objectives such as code coverage, coverage overlap, and historical performance change size.

The research employs an experimental setup that spans 10 open-source Java projects, encompassing 1829 distinct benchmarks across 161 software versions. The experiments reveal that the best-performing search-based genetic algorithm (GA) techniques do not statistically exceed the effectiveness of the Total greedy baseline, which prioritizes microbenchmarks based on maximum coverage.

Key Findings

  1. Benchmark Effectiveness: The Genetic Algorithm with combined coverage objectives (C-CO-CH) shows competitive performance compared to the Total coverage baseline but does not surpass it. Greedy approaches, especially relying on the historical performance change objective (CH), sometimes outperform all others in terms of median effectiveness without necessitating coverage information.
  2. Efficiency Considerations: The search-based techniques involve only marginal additional computational overhead compared to the greedy baselines. Greedy techniques relying on historical performance change are notably efficient, introducing less than 1% runtime overhead across various projects.
  3. Implications for Practice: Application of non-coverage-based, greedy CH techniques is recommended for practitioners due to their low overhead and ease of implementation. The paper advocates for non-coverage-based objectives as potent alternatives for software microbenchmark prioritization.
  4. Change-Awareness: Introducing change-awareness does not substantially impact the SBSMBP effectiveness, suggesting that simpler non-change-aware approaches might suffice, thus simplifying implementation without sacrificing performance.

Implications and Future Directions

This paper’s findings emphasize the challenge of surpassing traditional greedy methods with search-based approaches in the domain of microbenchmark prioritization. The results advocate for further exploration into alternative objectives that may better capture important performance changes, relying less on code coverage due to its associated overhead.

Future research could probe into more innovative objectives that address performance changes at different granularity levels or focus on real-world performance faults and developer-reported issues. Moreover, algorithmic innovations specifically tailored to SBSMBP may offer new insights into enhancing prioritization strategies.

The paper contributes to the evolving discourse on performance testing methodologies, underscoring the importance of balancing effectiveness with computational efficiency and advocating for practical, low-overhead solutions in continuous integration environments.