Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comparison of Reinforcement Learning Frameworks for Software Testing Tasks (2208.12136v3)

Published 25 Aug 2022 in cs.SE and cs.LG

Abstract: Software testing activities scrutinize the artifacts and the behavior of a software product to find possible defects and ensure that the product meets its expected requirements. Recently, Deep Reinforcement Learning (DRL) has been successfully employed in complex testing tasks such as game testing, regression testing, and test case prioritization to automate the process and provide continuous adaptation. Practitioners can employ DRL by implementing from scratch a DRL algorithm or using a DRL framework. DRL frameworks offer well-maintained implemented state-of-the-art DRL algorithms to facilitate and speed up the development of DRL applications. Developers have widely used these frameworks to solve problems in various domains including software testing. However, to the best of our knowledge, there is no study that empirically evaluates the effectiveness and performance of implemented algorithms in DRL frameworks. Moreover, some guidelines are lacking from the literature that would help practitioners choose one DRL framework over another. In this paper, we empirically investigate the applications of carefully selected DRL algorithms on two important software testing tasks: test case prioritization in the context of Continuous Integration (CI) and game testing. For the game testing task, we conduct experiments on a simple game and use DRL algorithms to explore the game to detect bugs. Results show that some of the selected DRL frameworks such as Tensorforce outperform recent approaches in the literature. To prioritize test cases, we run experiments on a CI environment where DRL algorithms from different frameworks are used to rank the test cases. Our results show that the performance difference between implemented algorithms in some cases is considerable, motivating further investigation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (10)
  1. (2016) Cartpole. https://gym.openai.com/envs/CartPole-v0/
  2. (2018) Mspacman. https://gym.openai.com/envs/MsPacman-v0/
  3. (2022) Replication package. https://github.com/npaulinastevia/DRL_se
  4. Drozd W, Wagner MD (2018) Fuzzergym: A competitive framework for fuzzing and learning. arXiv preprint arXiv:180707490
  5. Games PA, Howell JF (1976) Pairwise multiple comparison procedures with unequal n’s and/or variances: a monte carlo study. Journal of Educational Statistics 1(2):113–125
  6. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980
  7. Knuth DE (1997) The art of computer programming, vol 3. Pearson Education
  8. McGraw KO, Wong SP (1992) A common language effect size statistic. Psychological bulletin 111(2):361
  9. Plappert M (2016) keras-rl. https://github.com/keras-rl/keras-rl
  10. Welch BL (1947) The generalization of ‘student’s’problem when several different population varlances are involved. Biometrika 34(1-2):28–35
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Paulina Stevia Nouwou Mindom (5 papers)
  2. Amin Nikanjam (39 papers)
  3. Foutse Khomh (140 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.