Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BayesFLo: Bayesian fault localization of complex software systems (2403.08079v2)

Published 12 Mar 2024 in cs.SE and stat.ME

Abstract: Software testing is essential for the reliable development of complex software systems. A key step in software testing is fault localization, which uses test data to pinpoint failure-inducing combinations for further diagnosis. Existing fault localization methods have two key limitations: they (i) do not incorporate domain and/or structural knowledge from test engineers, and (ii) do not provide a probabilistic assessment of risk for potential root causes. Such methods can thus fail to confidently whittle down the combinatorial number of potential root causes in complex systems, resulting in prohibitively high testing costs. To address this, we propose a novel Bayesian fault localization framework called BayesFLo, which leverages a flexible Bayesian model for identifying potential root causes with probabilistic uncertainty. Using a carefully-specified prior on root cause probabilities, BayesFLo permits the integration of domain and structural knowledge via the principles of combination hierarchy and heredity, which capture the expected structure of failure-inducing combinations. We then develop new algorithms for efficient computation of posterior root cause probabilities, leveraging recent tools from integer programming and graph representations. Finally, we demonstrate the effectiveness of BayesFLo over existing methods in two fault localization case studies on the Traffic Alert and Collision Avoidance System and the JMP Easy DOE platform.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Bipartite Graphs and Their Applications, volume 131. Cambridge University Press.
  2. Pairwise testing: a best practice that isn’t. In Proceedings of 22nd Pacific Northwest Software Quality Conference, pages 180–196. Citeseer.
  3. A lift-and-project cutting plane algorithm for mixed 0–1 programs. Mathematical Programming, 58(1-3):295–324.
  4. The oracle problem in software testing: a survey. IEEE Transactions on Software Engineering, 41(5):507–525.
  5. Beizer, B. (2003). Software Testing Techniques. Dreamtech Press.
  6. Brownlee, J. (2016). XGBoost with Python: gradient boosted trees with XGBoost and scikit-learn. Machine Learning Mastery.
  7. Adaptive design for Gaussian process regression under censoring. The Annals of Applied Statistics, 16(2):744–764.
  8. XGBoost: a scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785–794.
  9. XGBoost: eXtreme Gradient Boosting. R Package Version 0.4-2, 1(4):1–4.
  10. Colbourn, C. J. (2004). Combinatorial aspects of covering arrays. Le Matematiche, 59(1, 2):125–172.
  11. Factor-covering designs for testing software. Technometrics, 40(3):234–243.
  12. A role for “one-factor-at-a-time” experimentation in parameter design. Research in Engineering Design, 14:65–74.
  13. Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378.
  14. Ghandehari, L. S. (2016). Fault Localization Based on Combinatorial Testing. PhD thesis, University of Texas at Arlington.
  15. A combinatorial testing-based approach to fault localization. IEEE Transactions on Software Engineering, 46(6):616–645.
  16. Gurobi Optimization, LLC (2023). Gurobi Optimizer Reference Manual. https://www.gurobi.com.
  17. Integer programming duality. In Encyclopedia of Operations Research and Management Science, pages 1–13. Wiley Hoboken, NJ, USA.
  18. An n^5/2 algorithm for maximum matchings in bipartite graphs. SIAM Journal on Computing, 2(4):225–231.
  19. An Introduction to Statistical Learning, volume 112. Springer.
  20. JMP Statistical Discovery LLC (2022–2023). JMP® 17 Design of Experiments Guide.
  21. JMP statistical discovery software. Wiley Interdisciplinary Reviews: Computational Statistics, 3(3):188–194.
  22. Software fault interactions and implications for software testing. IEEE Transactions on Software Engineering, 30(6):418–421.
  23. Kumar, G. A. (2019). A review on challenges in software testing. Journal of Information and Computational Science, 9(6-2019).
  24. Fault localization: analyzing covering arrays given prior information. In 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C), pages 116–121. IEEE.
  25. On the testing of statistical software. Journal of Statistical Theory and Practice, 15(4):76.
  26. Integer programming. In Avriel, M. and Golany, B., editors, Mathematical Programming for Industrial Engineers, pages 123–270. New York: Marcel Dekker, Inc.
  27. Projected support points: a new method for high-dimensional data reduction. arXiv preprint arXiv:1708.06897.
  28. TSEC: a framework for online experimentation under experimental constraints. Technometrics, 64(4):513–523.
  29. McCullough, B. D. (1998). Assessing the reliability of statistical software: Part I. The American Statistician, 52(4):358–366.
  30. The Art of Software Testing, volume 2. Wiley Online Library.
  31. The minimal failure-causing schema of combinatorial testing. ACM Transactions on Software Engineering and Methodology (TOSEM), 20(4):1–38.
  32. A survey of combinatorial testing. ACM Computing Surveys (CSUR), 43(2):1–29.
  33. Identifying failure-inducing combinations using tuple relationship. In 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation Workshops, pages 271–280. IEEE.
  34. North, M. (2020). Creating the NFL schedule with mathematical optimization. https://www.gurobi.com/events/creating-the-nfl-schedule-with-mathematical-optimization/.
  35. XGBoost model for chronic kidney disease diagnosis. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 17(6):2131–2140.
  36. Richard, S. (2020). Building the most efficient tail assignment schedule. https://www.gurobi.com/case_studies/air-france-tail-assignment-optimization/.
  37. Runeson, P. (2006). A survey of unit testing practices. IEEE Software, 23(4):22–29.
  38. Schrijver, A. (1998). Theory of Linear and Integer Programming. John Wiley & Sons.
  39. Skiena, S. S. (1998). The Algorithm Design Manual, volume 2. Springer.
  40. An XGBoost algorithm for predicting purchasing behaviour on E-commerce platforms. Tehnički Vjesnik, 27(5):1467–1471.
  41. A branch and bound algorithm for a class of biobjective mixed integer programs. Management Science, 60(4):1009–1032.
  42. A test generation strategy for pairwise testing. IEEE Transactions on Software Engineering, 28(1):109–111.
  43. Hierarchical shrinkage Gaussian processes: applications to computer code emulation and dynamical system recovery. arXiv preprint arXiv:2302.00755.
  44. Wolsey, L. A. (2020). Integer Programming. John Wiley & Sons.
  45. Software fault localization: an overview of research, techniques, and tools. Handbook of Software Fault Localization: Foundations and Advances, pages 1–117.
  46. Experiments: Planning, Analysis, and Optimization. John Wiley & Sons.
  47. Bayesian uncertainty quantification for low-rank matrix completion. Bayesian Analysis, 18(2):491–518.
  48. Trustworthy fault diagnosis with uncertainty estimation through evidential convolutional neural networks. IEEE Transactions on Industrial Informatics, 19(11):10842–10852.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com