Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ChatGPT and Human Synergy in Black-Box Testing: A Comparative Analysis (2401.13924v1)

Published 25 Jan 2024 in cs.SE

Abstract: In recent years, LLMs, such as ChatGPT, have been pivotal in advancing various artificial intelligence applications, including natural language processing and software engineering. A promising yet underexplored area is utilizing LLMs in software testing, particularly in black-box testing. This paper explores the test cases devised by ChatGPT in comparison to those created by human participants. In this study, ChatGPT (GPT-4) and four participants each created black-box test cases for three applications based on specifications written by the authors. The goal was to evaluate the real-world applicability of the proposed test cases, identify potential shortcomings, and comprehend how ChatGPT could enhance human testing strategies. ChatGPT can generate test cases that generally match or slightly surpass those created by human participants in terms of test viewpoint coverage. Additionally, our experiments demonstrated that when ChatGPT cooperates with humans, it can cover considerably more test viewpoints than each can achieve alone, suggesting that collaboration between humans and ChatGPT may be more effective than human pairs working together. Nevertheless, we noticed that the test cases generated by ChatGPT have certain issues that require addressing before use.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. “Adaptive Test Generation Using a Large Language Model” arXiv, 2023 arXiv: http://arxiv.org/abs/2302.06527
  2. “Finding Failure-Inducing Test Cases with ChatGPT” arXiv, 2023 arXiv: http://arxiv.org/abs/2304.11686
  3. Zubair Khaliq, Sheikh Umar Farooq and Dawood Ashraf Khan “Transformers for GUI Testing: A Plausible Solution to Automated Test Case Generation and Flaky Tests” In Computer 55.3, 2022, pp. 64–73 DOI: 10.1109/MC.2021.3136791
  4. “ChatGPT and Software Testing Education: Promises & Perils” arXiv, 2023 arXiv: http://arxiv.org/abs/2302.03287
  5. “Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI Testing” arXiv, 2023 arXiv: http://arxiv.org/abs/2305.09434
  6. “Language Models are Few-Shot Learners” arXiv, 2020 DOI: 10.48550/arXiv.2005.14165
  7. “Perturbation Validation: A New Heuristic to Validate Machine Learning Models”, 2020 arXiv: http://arxiv.org/abs/1905.10201
  8. Haopeng Zhang, Xiao Liu and Jiawei Zhang “Extractive Summarization via ChatGPT for Faithful Summary Generation” arXiv, 2023 DOI: 10.48550/arXiv.2304.04193
  9. Roee Aharoni, Melvin Johnson and Orhan Firat “Massively Multilingual Neural Machine Translation” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) Minneapolis, Minnesota: Association for Computational Linguistics, 2019, pp. 3874–3884 DOI: 10.18653/v1/N19-1388
  10. Yoshua Bengio, Réjean Ducharme and Pascal Vincent “A Neural Probabilistic Language Model” In Advances in Neural Information Processing Systems 13 MIT Press, 2000 URL: https://papers.nips.cc/paper_files/paper/2000/hash/728f206c2a01bf572b5940d7d9a8fa4c-Abstract.html
  11. “Is ChatGPT a Good Recommender? A Preliminary Study” arXiv, 2023 arXiv: http://arxiv.org/abs/2304.10149
  12. “GPTScore: Evaluate as You Desire” arXiv, 2023 DOI: 10.48550/arXiv.2302.04166
  13. Zheheng Luo, Qianqian Xie and Sophia Ananiadou “ChatGPT as a Factual Inconsistency Evaluator for Text Summarization” arXiv, 2023 DOI: 10.48550/arXiv.2303.15621
  14. “Attention is all you need” In Advances in neural information processing systems 30, 2017
  15. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) Minneapolis, Minnesota: Association for Computational Linguistics, 2019, pp. 4171–4186 DOI: 10.18653/v1/N19-1423
  16. “Language Models are Unsupervised Multitask Learners”
  17. “XLNet: Generalized Autoregressive Pretraining for Language Understanding” In Advances in Neural Information Processing Systems 32 Curran Associates, Inc., 2019 URL: https://proceedings.neurips.cc/paper/2019/hash/dc6a7e655d7e5840e66733e9ee67cc69-Abstract.html
  18. OpenAI “GPT-4 Technical Report” arXiv, 2023 DOI: 10.48550/arXiv.2303.08774
  19. “Generating test cases from UML activity diagram based on Gray-box method” ISSN: 1530-1362 In 11th Asia-Pacific Software Engineering Conference, 2004, pp. 284–291 DOI: 10.1109/APSEC.2004.55
  20. “A UML based approach to system testing” In Software and Systems Modeling 1, 2002 DOI: 10.1007/s10270-002-0004-8
  21. “Synthesis of test scenarios using UML activity diagrams” In Software and System Modeling 10, 2011, pp. 63–89 DOI: 10.1007/s10270-009-0133-4
  22. Bill Hasling, Helmut Goetz and Klaus Beetz “Model Based Testing of System Requirements using UML Use Case Models”, 2008, pp. 367–376 DOI: 10.1109/ICST.2008.9
  23. “Slicing-based test case generation from UML activity diagrams” In ACM SIGSOFT Software Engineering Notes 34, 2009, pp. 1–14 DOI: 10.1145/1640162.1666579
  24. “Automatic Test Generation: A Use Case Driven Approach” In Software Engineering, IEEE Transactions on 32, 2006, pp. 140–155 DOI: 10.1109/TSE.2006.22
  25. J.J. Gutiérrez, M.J. Escalona and M. Mejías “A Model-Driven approach for functional test case generation” In Journal of Systems and Software 109, 2015, pp. 214–228 DOI: 10.1016/j.jss.2015.08.001
  26. “Automated Test Case Generation from Dynamic Models” In ECOOP 2000 — Object-Oriented Programming, Lecture Notes in Computer Science Berlin, Heidelberg: Springer, 2000, pp. 472–491 DOI: 10.1007/3-540-45102-1˙23
  27. Valdivino Santiago Júnior and Nandamudi Vijaykumar “Generating model-based test cases from natural language requirements for space application software” In Software Quality Journal 20, 2012, pp. 77–143 DOI: 10.1007/s11219-011-9155-6
  28. Satoshi Masuda, Tohru Matsuodani and Kazuhiko Tsuda “Automatic Generation of Test Cases Using Document Analysis Techniques”, 2016
  29. Daniel Leitao, Dante Torres and Flávia Barros “NLForSpec: Translating Natural Language Descriptions into Formal Test Case Specifications.”, 2007, pp. 129–134
  30. “Automatic Test Case and Test Oracle Generation Based on Functional Scenarios in Formal Specifications for Conformance Testing” Conference Name: IEEE Transactions on Software Engineering In IEEE Transactions on Software Engineering 48.2, 2022, pp. 691–712 DOI: 10.1109/TSE.2020.2999884
  31. Jing Yang, Mohamed Ghazel and El-Miloudi El-Koursi “From formal specifications to efficient test scenarios generation” In 2013 International Conference on Advanced Logistics and Transport, 2013, pp. 35–40 DOI: 10.1109/ICAdLT.2013.6568431
  32. “Test scenario generation based on formal specification and usage profile” Publisher: World Scientific Publishing Co. In International Journal of Software Engineering and Knowledge Engineering 10.2, 2000, pp. 185–201 DOI: 10.1142/S0218194000000110
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Hiroyuki Kirinuki (1 paper)
  2. Haruto Tanno (1 paper)
Citations (2)