Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bounding Random Test Set Size with Computational Learning Theory (2405.17019v2)

Published 27 May 2024 in cs.SE

Abstract: Random testing approaches work by generating inputs at random, or by selecting inputs randomly from some pre-defined operational profile. One long-standing question that arises in this and other testing contexts is as follows: When can we stop testing? At what point can we be certain that executing further tests in this manner will not explore previously untested (and potentially buggy) software behaviors? This is analogous to the question in Machine Learning, of how many training examples are required in order to infer an accurate model. In this paper we show how probabilistic approaches to answer this question in Machine Learning (arising from Computational Learning Theory) can be applied in our testing context. This enables us to produce an upper bound on the number of tests that are required to achieve a given level of adequacy. We are the first to enable this from only knowing the number of coverage targets (e.g. lines of code) in the source code, without needing to observe a sample test executions. We validate this bound on a large set of Java units, and an autonomous driving system.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Exploiting the saturation effect in automatic random testing of android applications. In 2015 2nd ACM International Conference on Mobile Software Engineering and Systems. IEEE, 33–43.
  2. Dana Angluin. 1987. Learning regular sets from queries and counterexamples. Information and computation 75, 2 (1987), 87–106.
  3. Random testing: Theoretical results and practical implications. IEEE transactions on Software Engineering 38, 2 (2011), 258–277.
  4. Francesco Bergadano and Daniele Gunetti. 1996. Testing by means of inductive program learning. ACM Transactions on Software Engineering and Methodology (TOSEM) 5, 2 (1996), 119–145.
  5. Adaptive test case allocation, selection and generation using coverage spectrum and operational profile. IEEE Transactions on Software Engineering 47, 5 (2019), 881–898.
  6. Marcel Böhme. 2019. Assurances in software testing: A roadmap. In 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). IEEE, 5–8.
  7. Marcel Böhme and Soumya Paul. 2015. A probabilistic analysis of the efficiency of automated software testing. IEEE Transactions on Software Engineering 42, 4 (2015), 345–360.
  8. Using machine learning to refine category-partition test specifications and test suites. Information and Software Technology 51, 11 (2009), 1551–1564.
  9. Timothy A Budd and Dana Angluin. 1982. Two notions of correctness and their relation to testing. Acta informatica 18 (1982), 31–45.
  10. An empirical study on mutation, statement and branch coverage fault revelation that avoids the unreliable clean program assumption. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 597–608.
  11. Adaptive random testing. In Advances in Computer Science-ASIAN 2004. Higher-Level Decision Making: 9th Asian Computing Science Conference. Springer, 320–329.
  12. PAC learning-based verification and model synthesis. In Proceedings of the 38th International Conference on Software Engineering. 714–724.
  13. Guided gui testing of android apps with minimal restart and approximate learning. Acm Sigplan Notices 48, 10 (2013), 623–640.
  14. Koen Claessen and John Hughes. 2000. QuickCheck: a lightweight tool for random testing of Haskell programs. In Proceedings of the fifth ACM SIGPLAN international conference on Functional programming. 268–279.
  15. CARLA: An open urban driving simulator. In Conference on robot learning. PMLR, 1–16.
  16. The QSM algorithm and its application to software behavior model induction. Applied Artificial Intelligence 22, 1-2 (2008), 77–115.
  17. Gintare Karolina Dziugaite and Daniel M. Roy. 2017. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data. CoRR abs/1703.11008 (2017). arXiv:1703.11008 https://arxiv.org/abs/1703.11008
  18. Phyllis G. Frankl and Elaine J. Weyuker. 1988. An applicable family of data flow testing criteria. IEEE Transactions on Software Engineering 14, 10 (1988), 1483–1498.
  19. Gordon Fraser and Andrea Arcuri. 2011. Evosuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 416–419.
  20. Gordon Fraser and Andrea Arcuri. 2014. A Large Scale Evaluation of Automated Unit Test Generation Using EvoSuite. ACM Transactions on Software Engineering and Methodology (TOSEM) 24, 2 (2014), 8.
  21. Gordon Fraser and Neil Walkinshaw. 2015. Assessing and generating test sets in terms of behavioural adequacy. Software Testing, Verification and Reliability 25, 8 (2015), 749–780.
  22. Kamran Ghani and John A Clark. 2008. Strengthening inferred specifications using search based testing. In 2008 IEEE International Conference on Software Testing Verification and Validation Workshop. IEEE, 187–194.
  23. E Mark Gold. 1978. Complexity of automaton identification from given data. Information and control 37, 3 (1978), 302–320.
  24. Property testing and its connection to learning and approximation. Journal of the ACM (JACM) 45, 4 (1998), 653–750.
  25. John B Goodenough and Susan L Gerhart. 1975. Toward a theory of test data selection. In Proceedings of the international conference on Reliable software. 493–510.
  26. Randomized differential testing as a prelude to formal verification. In 29th International Conference on Software Engineering (ICSE’07). IEEE, 621–631.
  27. Ralph Guderlei and Johannes Mayer. 2007. Statistical metamorphic testing testing programs with random output by means of statistical hypothesis tests and metamorphic testing. In Seventh International Conference on Quality Software (QSIC 2007). IEEE, 404–409.
  28. Maxime Haddouche and Benjamin Guedj. 2022. Online pac-bayes learning. Advances in Neural Information Processing Systems 35 (2022), 25725–25738.
  29. Richard Hamlet. 1994. Random Testing. Encyclopedia of Software Engineering (1994).
  30. Mary Jean Harrold. 2000. Testing: a roadmap. In Proceedings of the Conference on the Future of Software Engineering. 61–72.
  31. An empirical investigation of program spectra. In Proceedings of the 1998 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering. 83–90.
  32. David Haussler. 1988. Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artificial intelligence 36, 2 (1988), 177–221.
  33. Laura Inozemtseva and Reid Holmes. 2014. Coverage is not strongly correlated with test suite effectiveness. In Proceedings of the 36th international conference on software engineering. 435–445.
  34. The open-source learnLib: a framework for active automata learning. In Computer Aided Verification: 27th International Conference, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part I 27. Springer, 487–495.
  35. Code coverage at Google. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 955–963.
  36. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 2014 international symposium on software testing and analysis. 437–440.
  37. Michael J Kearns and Umesh Vazirani. 1994. An introduction to computational learning theory. MIT press.
  38. Castle: Regularization via auxiliary causal graph discovery. Advances in Neural Information Processing Systems 33 (2020), 1501–1512.
  39. Reachable Coverage: Estimating Saturation in Fuzzing. In Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE’23), 17-19 May 2023, Australia.
  40. Extrapolating Coverage Rate in Greybox Fuzzing. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–12.
  41. Michael R Lyu et al. 1996. Handbook of software reliability engineering. Vol. 222. IEEE computer society press Los Alamitos.
  42. David A McAllester. 1998. Some pac-bayesian theorems. In Proceedings of the eleventh annual conference on Computational learning theory. 230–234.
  43. Karl Meinke and Muddassar A Sindhu. 2011. Incremental learning-based testing for reactive systems. In International Conference on Tests and Proofs. Springer, 134–151.
  44. Breno Miranda and Antonia Bertolino. 2015. Improving test coverage measurement for reused software. In 2015 41st Euromicro Conference on Software Engineering and Advanced Applications. IEEE, 27–34.
  45. Breno Miranda and Antonia Bertolino. 2020. Testing relative to usage scope: Revisiting software coverage criteria. ACM Transactions on Software Engineering and Methodology (TOSEM) 29, 3 (2020), 1–24.
  46. Foundations of machine learning. MIT press.
  47. Edward F Moore et al. 1956. Gedanken-experiments on sequential machines. Automata studies 34 (1956), 129–153.
  48. Rajeev Motwani and Prabhakar Raghavan. 1996. Randomized algorithms. ACM Computing Surveys (CSUR) 28, 1 (1996), 33–37.
  49. John D. Musa. 1993. Operational profiles in software-reliability engineering. IEEE software 10, 2 (1993), 14–32.
  50. Changhai Nie and Hareton Leung. 2011. A survey of combinatorial testing. ACM Computing Surveys (CSUR) 43, 2 (2011), 1–29.
  51. The use of program profiling for software maintenance with applications to the year 2000 problem. In Proceedings of the 6th European SOFTWARE ENGINEERING conference held jointly with the 5th ACM SIGSOFT international symposium on Foundations of software engineering. 432–449.
  52. Kathleen Romanik and Jeffrey Scott Vitter. 1993. Using computational learning theory to analyze the testing complexity of program segments. In Proceedings of 1993 IEEE 17th International Computer Software and Applications Conference COMPSAC’93. IEEE, 367–373.
  53. Kathleen Romanik and Jeffrey Scott Vitter. 1996. Using Vapnik–Chervonenkis Dimension to Analyze the Testing Complexity of Program Segments. Information and Computation 128, 2 (1996), 87–108.
  54. Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding machine learning: From theory to algorithms. Cambridge university press.
  55. A theory of pac learnability under transformation invariances. Advances in Neural Information Processing Systems 35 (2022), 13989–14001.
  56. Developing and evaluating objective termination criteria for random testing. ACM Transactions on Software Engineering and Methodology (TOSEM) 28, 3 (2019), 1–52.
  57. Leslie Valiant. 2014. Probably Approximately Correct: Nature’s Algorithms for Learning and Prospering in a Complex World.
  58. Leslie G Valiant. 1984. A theory of the learnable. Commun. ACM 27, 11 (1984), 1134–1142.
  59. Vladimir Vapnik. 1999. The nature of statistical learning theory. Springer science & business media.
  60. Neil Walkinshaw and Gordon Fraser. 2017. Uncertainty-driven black-box test data generation. In 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST). IEEE, 253–263.
  61. Elaine J Weyuker. 1983. Assessing test data adequacy through program inference. ACM Transactions on Programming Languages and Systems (TOPLAS) 5, 4 (1983), 641–655.
  62. Elaine J Weyuker. 1986. Axiomatizing software test data adequacy. IEEE transactions on software engineering 12 (1986), 1128–1138.
  63. A survey on software fault localization. IEEE Transactions on Software Engineering 42, 8 (2016), 707–740.
  64. Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline. arXiv:2206.08129 [cs.CV]
  65. Hong Zhu. 1996. A formal interpretation of software testing as inductive inference. Software Testing, Verification and Reliability 6, 1 (1996), 3–31.
  66. Inductive inference and software testing. Software Testing, Verification and Reliability 2, 2 (1992), 69–81.
  67. Software unit test coverage and adequacy. Acm computing surveys (csur) 29, 4 (1997), 366–427.
  68. Fuzzing: a survey for roadmap. ACM Computing Surveys (CSUR) 54, 11s (2022), 1–36.

Summary

We haven't generated a summary for this paper yet.