Testing for Fault Diversity in Reinforcement Learning (2403.15065v1)
Abstract: Reinforcement Learning is the premier technique to approach sequential decision problems, including complex tasks such as driving cars and landing spacecraft. Among the software validation and verification practices, testing for functional fault detection is a convenient way to build trustworthiness in the learned decision model. While recent works seek to maximise the number of detected faults, none consider fault characterisation during the search for more diversity. We argue that policy testing should not find as many failures as possible (e.g., inputs that trigger similar car crashes) but rather aim at revealing as informative and diverse faults as possible in the model. In this paper, we explore the use of quality diversity optimisation to solve the problem of fault diversity in policy testing. Quality diversity (QD) optimisation is a type of evolutionary algorithm to solve hard combinatorial optimisation problems where high-quality diverse solutions are sought. We define and address the underlying challenges of adapting QD optimisation to the test of action policies. Furthermore, we compare classical QD optimisers to state-of-the-art frameworks dedicated to policy testing, both in terms of search efficiency and fault diversity. We show that QD optimisation, while being conceptually simple and generally applicable, finds effectively more diverse faults in the decision model, and conclude that QD-based policy testing is a promising approach.
- Metamorphic Testing: A New Approach for Generating Next Test Cases. Technical Report. Department of Computer Science, Hong Kong University of Science and Technology.
- Antoine Cully and Yiannis Demiris. 2018. Quality and Diversity Optimization: A Unifying Modular Framework. IEEE Transactions on Evolutionary Computation 22, 2 (2018), 245–259. https://doi.org/10.1109/TEVC.2017.2704781
- Thomas G Dietterich. 2000. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of artificial intelligence research 13 (2000), 227–303.
- Metamorphic Relations via Relaxations: An Approach to Obtain Oracles for Action-Policy Testing. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. https://doi.org/10.1145/3533767.3534392
- Keith Frankish and William M. Ramsey (Eds.). 2014. The Cambridge Handbook of Artificial Intelligence. Cambridge University Press, Cambridge, UK.
- D. N. Geary. 2018. Mixture Models: Inference and Applications to Clustering. Journal of the Royal Statistical Society Series A: Statistics in Society 152, 1 (12 2018), 126–127. https://doi.org/10.2307/2982840 arXiv:https://academic.oup.com/jrsssa/article-pdf/152/1/126/49758778/jrsssa_152_1_126.pdf
- Devising Effective Novelty Search Algorithms: A Comprehensive Empirical Study. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation (Madrid, Spain) (GECCO ’15). Association for Computing Machinery, New York, NY, USA, 943–950. https://doi.org/10.1145/2739480.2754736
- Exploring the BipedalWalker Benchmark with MAP-Elites and Curiosity-Driven A3C. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion (Cancún, Mexico) (GECCO ’20). Association for Computing Machinery, New York, NY, USA, 79–80. https://doi.org/10.1145/3377929.3389921
- John H. Holland. 1992. Genetic Algorithms. Scientific American (1992).
- Rushang Karia and Siddharth Srivastava. 2020. Learning Generalized Relational Heuristic Networks for Model-Agnostic Planning. CoRR (2020).
- Joel Lehman and Kenneth O. Stanley. 2008. Exploiting Open-Endedness to Solve Problems Through the Search for Novelty. In IEEE Symposium on Artificial Life. https://api.semanticscholar.org/CorpusID:2367605
- Joel Lehman and Kenneth O Stanley. 2011a. Abandoning objectives: Evolution through the search for novelty alone. Evolutionary computation 19, 2 (2011), 189–223.
- Joel Lehman and Kenneth O. Stanley. 2011b. Evolving a Diversity of Virtual Creatures through Novelty Search and Local Competition. In Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (Dublin, Ireland) (GECCO ’11). Association for Computing Machinery, New York, NY, USA, 211–218. https://doi.org/10.1145/2001576.2001606
- Learning Configurations of Operating Environment of Autonomous Vehicles to Maximize their Collisions. IEEE Transactions on Software Engineering (2023). https://doi.org/10.1109/TSE.2022.3150788
- A Review of Validation and Verification of Neural Network-based Policies for Sequential Decision Making. In Rencontres des Jeunes Chercheurs en Intelligence Artificielle (RJCIA). https://pfia23.icube.unistra.fr/conferences/rjcia/Actes/RJCIA2023_paper_5.pdf
- William M. McKeeman. 1998. Differential Testing for Software. Digit. Tech. J. (1998).
- Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by mapping elites. arXiv:1504.04909 [cs.AI]
- MDPFuzz: Testing Models Solving Markov Decision Processes. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. https://doi.org/10.1145/3533767.3534388
- Deepxplore: Automated whitebox testing of deep learning systems. In proceedings of the 26th Symposium on Operating Systems Principles. 1–18.
- Antonin Raffin. 2020. RL Baselines3 Zoo. https://github.com/DLR-RM/rl-baselines3-zoo.
- A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science (2018). https://doi.org/10.1126/science.aar6404
- Debugging a Policy: Automatic Action-Policy Testing in AI Planning. Proceedings of the International Conference on Automated Planning and Scheduling (2022). https://doi.org/10.1609/icaps.v32i1.19820
- Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
- Search-Based Testing of Reinforcement Learning. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, Luc De Raedt (Ed.). ijcai.org, 503–510. https://doi.org/10.24963/IJCAI.2022/72
- DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars. In Proceedings of the 40th International Conference on Software Engineering. https://doi.org/10.1145/3180155.3180220
- Using CMA-ME to Land a Lunar Lander Like a Space Shuttle. pyribs.org (2021). https://docs.pyribs.org/en/stable/tutorials/lunar_lander.html
- Gymnasium. https://doi.org/10.5281/zenodo.8127026
- Asnets: Deep learning for generalised planning. Journal of Artificial Intelligence Research 68 (2020).
- Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 1814–1826. https://doi.org/10.1109/ICSE48619.2023.00155
- Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 8, 3 (01 May 1992), 279–292. https://doi.org/10.1007/BF00992698
- A Search-Based Testing Approach for Deep Reinforcement Learning Agents. IEEE Transactions on Software Engineering 49, 7 (2023), 3715–3735. https://doi.org/10.1109/TSE.2023.3269804