Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Testing for Fault Diversity in Reinforcement Learning (2403.15065v1)

Published 22 Mar 2024 in cs.SE

Abstract: Reinforcement Learning is the premier technique to approach sequential decision problems, including complex tasks such as driving cars and landing spacecraft. Among the software validation and verification practices, testing for functional fault detection is a convenient way to build trustworthiness in the learned decision model. While recent works seek to maximise the number of detected faults, none consider fault characterisation during the search for more diversity. We argue that policy testing should not find as many failures as possible (e.g., inputs that trigger similar car crashes) but rather aim at revealing as informative and diverse faults as possible in the model. In this paper, we explore the use of quality diversity optimisation to solve the problem of fault diversity in policy testing. Quality diversity (QD) optimisation is a type of evolutionary algorithm to solve hard combinatorial optimisation problems where high-quality diverse solutions are sought. We define and address the underlying challenges of adapting QD optimisation to the test of action policies. Furthermore, we compare classical QD optimisers to state-of-the-art frameworks dedicated to policy testing, both in terms of search efficiency and fault diversity. We show that QD optimisation, while being conceptually simple and generally applicable, finds effectively more diverse faults in the decision model, and conclude that QD-based policy testing is a promising approach.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Metamorphic Testing: A New Approach for Generating Next Test Cases. Technical Report. Department of Computer Science, Hong Kong University of Science and Technology.
  2. Antoine Cully and Yiannis Demiris. 2018. Quality and Diversity Optimization: A Unifying Modular Framework. IEEE Transactions on Evolutionary Computation 22, 2 (2018), 245–259. https://doi.org/10.1109/TEVC.2017.2704781
  3. Thomas G Dietterich. 2000. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of artificial intelligence research 13 (2000), 227–303.
  4. Metamorphic Relations via Relaxations: An Approach to Obtain Oracles for Action-Policy Testing. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. https://doi.org/10.1145/3533767.3534392
  5. Keith Frankish and William M. Ramsey (Eds.). 2014. The Cambridge Handbook of Artificial Intelligence. Cambridge University Press, Cambridge, UK.
  6. D. N. Geary. 2018. Mixture Models: Inference and Applications to Clustering. Journal of the Royal Statistical Society Series A: Statistics in Society 152, 1 (12 2018), 126–127. https://doi.org/10.2307/2982840 arXiv:https://academic.oup.com/jrsssa/article-pdf/152/1/126/49758778/jrsssa_152_1_126.pdf
  7. Devising Effective Novelty Search Algorithms: A Comprehensive Empirical Study. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation (Madrid, Spain) (GECCO ’15). Association for Computing Machinery, New York, NY, USA, 943–950. https://doi.org/10.1145/2739480.2754736
  8. Exploring the BipedalWalker Benchmark with MAP-Elites and Curiosity-Driven A3C. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion (Cancún, Mexico) (GECCO ’20). Association for Computing Machinery, New York, NY, USA, 79–80. https://doi.org/10.1145/3377929.3389921
  9. John H. Holland. 1992. Genetic Algorithms. Scientific American (1992).
  10. Rushang Karia and Siddharth Srivastava. 2020. Learning Generalized Relational Heuristic Networks for Model-Agnostic Planning. CoRR (2020).
  11. Joel Lehman and Kenneth O. Stanley. 2008. Exploiting Open-Endedness to Solve Problems Through the Search for Novelty. In IEEE Symposium on Artificial Life. https://api.semanticscholar.org/CorpusID:2367605
  12. Joel Lehman and Kenneth O Stanley. 2011a. Abandoning objectives: Evolution through the search for novelty alone. Evolutionary computation 19, 2 (2011), 189–223.
  13. Joel Lehman and Kenneth O. Stanley. 2011b. Evolving a Diversity of Virtual Creatures through Novelty Search and Local Competition. In Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (Dublin, Ireland) (GECCO ’11). Association for Computing Machinery, New York, NY, USA, 211–218. https://doi.org/10.1145/2001576.2001606
  14. Learning Configurations of Operating Environment of Autonomous Vehicles to Maximize their Collisions. IEEE Transactions on Software Engineering (2023). https://doi.org/10.1109/TSE.2022.3150788
  15. A Review of Validation and Verification of Neural Network-based Policies for Sequential Decision Making. In Rencontres des Jeunes Chercheurs en Intelligence Artificielle (RJCIA). https://pfia23.icube.unistra.fr/conferences/rjcia/Actes/RJCIA2023_paper_5.pdf
  16. William M. McKeeman. 1998. Differential Testing for Software. Digit. Tech. J. (1998).
  17. Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by mapping elites. arXiv:1504.04909 [cs.AI]
  18. MDPFuzz: Testing Models Solving Markov Decision Processes. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. https://doi.org/10.1145/3533767.3534388
  19. Deepxplore: Automated whitebox testing of deep learning systems. In proceedings of the 26th Symposium on Operating Systems Principles. 1–18.
  20. Antonin Raffin. 2020. RL Baselines3 Zoo. https://github.com/DLR-RM/rl-baselines3-zoo.
  21. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science (2018). https://doi.org/10.1126/science.aar6404
  22. Debugging a Policy: Automatic Action-Policy Testing in AI Planning. Proceedings of the International Conference on Automated Planning and Scheduling (2022). https://doi.org/10.1609/icaps.v32i1.19820
  23. Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
  24. Search-Based Testing of Reinforcement Learning. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, Luc De Raedt (Ed.). ijcai.org, 503–510. https://doi.org/10.24963/IJCAI.2022/72
  25. DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars. In Proceedings of the 40th International Conference on Software Engineering. https://doi.org/10.1145/3180155.3180220
  26. Using CMA-ME to Land a Lunar Lander Like a Space Shuttle. pyribs.org (2021). https://docs.pyribs.org/en/stable/tutorials/lunar_lander.html
  27. Gymnasium. https://doi.org/10.5281/zenodo.8127026
  28. Asnets: Deep learning for generalised planning. Journal of Artificial Intelligence Research 68 (2020).
  29. Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 1814–1826. https://doi.org/10.1109/ICSE48619.2023.00155
  30. Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 8, 3 (01 May 1992), 279–292. https://doi.org/10.1007/BF00992698
  31. A Search-Based Testing Approach for Deep Reinforcement Learning Agents. IEEE Transactions on Software Engineering 49, 7 (2023), 3715–3735. https://doi.org/10.1109/TSE.2023.3269804
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com